基于机器学习算法构建抗中性粒细胞胞质抗体相关性血管炎伴肾小球性肾炎的诊断预测模型

    A diagnostic prediction model for anti-neutrophil cytoplasmic antibody associated vasculitis combined with glomerulonephritis based on machine learning algorithm

    • 摘要: 目的 联合运用随机森林与人工神经网络算法构建抗中性粒细胞胞质抗体(anti-neutrophil cytoplasmic antibodies, ANCA)相关性血管炎(ANCA-associated vasculitis, AAV)伴肾小球性肾炎的诊断模型。方法 从GEO和Array Express数据库下载分析所用的数据集(GSE108113和GSE104948为训练集,E-MTAB-1944为验证集),并在AAV伴肾小球性肾炎样本和正常对照样本中鉴定出差异基因。针对差异基因,进行了GO、KEGG富集分析并构建了蛋白质–蛋白质相互作用(protein protein interaction,PPI)网络。应用随机森林和人工神经网络算法进一步筛选特征基因,并构建和验证诊断模型。结果 鉴定出380个差异基因,其中194个显著上调,186个显著下调。富集结果显示差异基因多与免疫反应和代谢过程相关的通路有关。EHHADH、CCL2、FN1、IL1B、VAV1、CXCR4、CCL5、CD44位于PPI网络的核心。随机森林算法筛选出15个特征基因,人工神经网络算法计算每个特征基因的权重并成功构建了诊断模型。该模型具有显著的预测能力,曲线下面积(area under curve,AUC)1.000。验证队列的AUC为0.808,进一步证实了模型的准确性。结论 本研究运用机器学习算法成功鉴定了AAV伴肾小球肾炎的特征性生物标志物,并构建了诊断模型。该模型可为疾病早期诊断提供可靠参考,并为发病机制的研究提供新的视角。

       

      Abstract: Objective To construct a diagnostic model for anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (AAV) combined with glomerulonephritis using both random forest (RF) and artificial neural network algorithms. Methods Datasets were downloaded from the GEO and Array Express databases (GSE108113 and GSE104948 as training sets, E-MTAB-1944 as validation sets). Differentially expressed genes (DEGs) between AAV combined with glomerulonephritis samples and normal controls were identified, followed by GO and KEGG enrichment analyses and protein-protein interaction (PPI) network analysis. RF and artificial neural network algorithms were jointly used to further screen characteristic genes. Diagnostic models were constructed and validated. Results A total of 380 DEGs were identified, involving 194 significantly up-regulated and 186 down-regulated ones. The enrichment analyses showed that DEGs were significantly enriched in the immune response and metabolic processes. EHHADH, CCL2, FN1, IL1B, VAV1, CXCR4, CCL5, and CD44were core genes in the PPI network. The RF algorithm screened out 15 characteristic genes, and the artificial neural network algorithm calculated the weight of each characteristic gene and successfully constructed a diagnostic model. The model had significant predictive power, with an area under the curve (AUC) of 1.000. The AUC of the validation cohort was 0.808, further confirming the accuracy of the model. Conclusions Characteristic biomarkers of AAV combined with glomerulonephritis are identified by machine learning algorithms, which are used to build a diagnostic model. The model can serve as a reliable reference for early diagnosis of the disease and provide a new perspective for the study of pathogenesis.

       

    /

    返回文章
    返回