切换至 "中华医学电子期刊资源库"

中华细胞与干细胞杂志(电子版) ›› 2023, Vol. 13 ›› Issue (01) : 19 -26. doi: 10.3877/cma.j.issn.2095-1221.2023.01.003

论著

利用随机森林联合人工神经网络基于外周血细胞易感基因建立冠心病诊断模型
谢恩睿1, 段一璇1, 刘畅1, 邓捷1,()   
  1. 1. 710000 西安交通大学第二附属医院心血管内科
  • 收稿日期:2022-08-20 出版日期:2023-02-01
  • 通信作者: 邓捷
  • 基金资助:
    西安交通大学医学"基础-临床"融合创新项目(YXJLRH2022073)

Construction of diagnosis model for coronary atherosclerosis heart disease using random forest and artificial neural network based on susceptibility genes in peripheral blood cells

Enrui Xie1, Yixuan Duan1, Chang Liu1, Jie Deng1,()   

  1. 1. Department of Cardiovascular Medicine, the Second Affiliated Hospital of Xi'an Jiaotng University, Xi'an 710000, China
  • Received:2022-08-20 Published:2023-02-01
  • Corresponding author: Jie Deng
引用本文:

谢恩睿, 段一璇, 刘畅, 邓捷. 利用随机森林联合人工神经网络基于外周血细胞易感基因建立冠心病诊断模型[J/OL]. 中华细胞与干细胞杂志(电子版), 2023, 13(01): 19-26.

Enrui Xie, Yixuan Duan, Chang Liu, Jie Deng. Construction of diagnosis model for coronary atherosclerosis heart disease using random forest and artificial neural network based on susceptibility genes in peripheral blood cells[J/OL]. Chinese Journal of Cell and Stem Cell(Electronic Edition), 2023, 13(01): 19-26.

目的

运用生物信息学方法联合随机森林和人工神经网络(ANN)筛选冠心病外周血细胞易感基因并构建冠心病诊断模型,为临床提供筛查冠心病潜在的分子生物标志物。

方法

从GEO数据库中下载3个基因表达谱数据(GSE20680、GSE20681和GSE12288),基于GSE20680进行差异表达基因的筛选、GO和KEGG富集分析,然后运用随机森林的机器学习算法对筛选到的差异表达基因进行关键基因的获取,最后综合利用这3个数据集建立1个训练集和2个测试集分别进行ANN诊断模型的构建和性能的验证。

结果

利用GEO数据库中得到的基因表达谱数据,基于随机森林的机器学习算法从284个差异表达基因中筛选出21个与冠心病相关的关键基因,利用ANN计算关键基因的权重,成功地构建冠心病诊断模型,最后利用2个测试集对该诊断模型的性能进行验证,AUC均较高(分别为0.9024和0.8153)。

结论

本研究筛选出21个冠心病相关的基因生物标志物,并建立冠心病诊断模型,该模型对冠心病有较好的分类效果,有助于冠心病筛查和早期临床诊断。

Objective

Our study aims to find susceptibility genes from peripheral blood cells as potential molecular biomarkers of coronary heart disease (CHD) and to create a diagnosis model using bioinformatics combined with random forest (RF) and artificial neural network (ANN) .

Methods

We downloaded three gene expression profiles (GSE20680, GSE20681, GSE12288) from Gene Expression Omnibus (GEO) database. Then we performed analyses of differential expression, gene ontology terms, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways based on GSE20680. Next, the RF was used further to obtain the key genes from the differentially expressed genes. Finally, we set up a training set to construct the diagnostic model using ANN and two test sets to verify the diagnostic efficacy of the model by comprehensively merging the three datasets.

Results

Using gene expression profiles in the GEO database, we identified 21 key genes from 284 differentially expressed genes by RF, and a new diagnostic model of CHD was also successfully constructed by using ANN to calculate the weight of key genes. Finally, two test sets were used to verify the diagnostic model's performance, and the AUC values were high (0.9024 and 0.8153 respectively) .

Conclusion

We identified 21 potential gene biomarkers of CHD and established a novel diagnostic model which shows a good result in the classification of CHD, and it may be helpful to CHD screening and early clinical diagnosis.

表1 数据集特征
图1 研究流程图
图2 GSE20680中DEG的火山图注:横坐标为logFC,纵坐标为-log10 (P值);每个点代表1个基因,红点是冠心病组与正常样本相比表达上调的基因,蓝点是表达下调的基因
图3 差异表达基因GO富集分析气泡图注:图中横轴是基因百分比,指每1个GO注释上基因占所有差异基因的百分比;纵轴是富集出来的GO条目;点的大小表示基因数;点的颜色越接近红色,代表P值越小,越接近蓝色,代表P值越大
图4 284个差异表达基因KEGG富集的圈图注:基因列在左边,上调基因呈棕色,下调基因呈浅蓝色;圈图中的连接关系表示DEG所属的KEGG通路
图5 RF筛选冠心病特征候选基因注:a图为RF模型中变量个数(mtry)和相应的袋外错误率的散点图;b图为决策树数目对错误率的影响;横轴是决策树的数目,纵轴是错误率;c图为RF模型中前30个基因基于MeanDecreaseGini排序;d图为21个特征候选基因重要性直方图;横轴代表基因,纵轴是重要性;e图为GSE20680中21个基因的表达热图;图中行代表基因,列代表样本,对表达值进行了归一化处理,热图上方的条带红色为CHD组,蓝色为对照组
图6 基于21个基因构建的CHD-ANN诊断模型图注:具有1个输入层,1个隐藏层(包含5个神经元)和1个输出层的神经网络拓扑
图7 ANN模型基于2个测试集绘制的ROC曲线注:a图为ANN模型在测试集1的验证情况;b图为ANN模型在测试集2的验证情况
1
Lozano R, Naghavi M, Foreman K, et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010[J]. Lancet, 2012, 380(9859):2095-2128.
2
Turk-Adawi K, Sarrafzadegan N, Fadhil I, et al. Cardiovascular disease in the Eastern Mediterranean region: epidemiology and risk factor burden[J]. Nat Rev Cardiol, 2018, 15(2):106-119.
3
胡盛寿, 高润霖, 刘力生, 等. 《中国心血管病报告2018》概要[J]. 中国循环杂志, 2019, 34(3):209-220.
4
Khera AV, Emdin CA, Drake I, et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease[J]. N Engl J Med, 2016, 375(24):2349-2358.
5
Nikpay M, Goel A, Won HH, et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease[J]. Nat Genet, 2015, 47(10):1121-1130.
6
Nelson CP, Goel A, Butterworth AS, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease[J]. Nat Genet, 2017, 49(9):1385-1391.
7
Beineke P, Fitch K, Tao H, et al. A whole blood gene expression-based signature for smoking status[J]. BMC Med Genomics, 2012, 5:58.
8
Khera AV, Kathiresan S. Genetics of coronary artery disease: discovery, biology and clinical translation[J]. Nat Rev Genet, 2017, 18(6):331-344.
9
Lebedev AV, Westman E, Van Westen GJ, et al. Random forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness[J]. Neuroimage Clin, 2014, 6:115-125.
10
Toth R, Schiffmann H, Hube-Magg C, et al. Random forest-based modelling to detect biomarkers for prostate cancer progression[J]. Clin Epigenetics, 2019, 11(1):148.
11
Kong Y, Yu T. A Deep Neural Network model using random forest to extract feature representation for gene expression data classification[J]. Sci Rep, 2018, 8(1):16477.
12
Sinnaeve PR, Donahue MP, Grass P, et al. Gene expression patterns in peripheral blood correlate with the extent of coronary artery disease[J]. PLoS One, 2009, 4(9):e7037.
13
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods[J]. Biostatistics, 2007, 8(1):118-127.
14
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies[J]. Nucleic Acids Res, 2015, 43(7):e47.
15
Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters[J]. OMICS, 2012, 16(5):284-287.
16
Maouche S, Schunkert H. Strategies beyond genome-wide association studies for atherosclerosis[J]. Arterioscler Thromb Vasc Biol, 2012, 32(2):170-181.
17
Gasser T C. Biomechanical rupture risk assessment: a consistent and objective decision-making tool for abdominal aortic aneurysm patients[J]. Aorta (Stamford), 2016, 4(2):42-60.
18
Douguet D, Patel A, Xu A, et al. Piezo ion channels in cardiovascular mechanobiology[J]. Trends Pharmacol Sci, 2019, 40(12):956-970.
19
Zhao C, Ikeda S, Arai T, et al. Association of the RYR3 gene polymorphisms with atherosclerosis in elderly Japanese population[J]. BMC Cardiovasc Disord, 2014, 14:6.
20
Da SI, Barroso M, Moura T, et al. Endothelial aquaporins and hypomethylation: potential implications for atherosclerosis and cardiovascular disease[J]. Int J Mol Sci, 2018, 19(1):130.
21
Wang Y, Liu Z, Li C, et al. Drug target prediction based on the herbs components: the study on the multitargets pharmacological mechanism of qishenkeli acting on the coronary heart disease[J]. Evid Based Complement Alternat Med, 2012, 2012:698531.
22
Zhou T, Li S, Yang L, et al. microRNA-363-3p reduces endothelial cell inflammatory responses in coronary heart disease via inactivation of the NOX4-dependent p38 MAPK axis[J]. Aging (Albany NY), 2021, 13(8):11061-11082.
23
van Venrooij NA, Pereira RC, Tintut Y, et al. FGF23 protein expression in coronary arteries is associated with impaired kidney function[J]. Nephrol Dial Transplant, 2014, 29(8):1525-1532.
24
Iakoubova OA, Tong CH, Rowland CM, et al. Association of the Trp719Arg polymorphism in kinesin-like protein 6 with myocardial infarction and coronary heart disease in 2 prospective trials: the CARE and WOSCOPS trials[J]. J Am Coll Cardiol, 2008, 51(4):435-443.
25
Shimabukuro M. Serotonin and atheroscelotic cardiovascular disease[J]. J Atheroscler Thromb, 2022, 29(3):315-316.
26
Al-Massadi O, Quiñones M, Clasadonte J, et al. MCH regulates SIRT1/FoxO1 and reduces POMC neuronal activity to induce hyperphagia, adiposity, and glucose intolerance[J]. Diabetes, 2019, 68(12):2210-2222.
27
Climent B, Santiago E, Sánchez A, et al. Metabolic syndrome inhibits store-operated Ca2+ entry and calcium-induced calcium-release mechanism in coronary artery smooth muscle[J]. Biochem Pharmacol, 2020, 182:114222.doi: 10.1016/j.bcp.2020.114222.
28
Müller II, Müller K AL, Karathanos A, et al. Impact of counterbalance between macrophage migration inhibitory factor and its inhibitor Gremlin-1 in patients with coronary artery disease[J]. Atherosclerosis, 2014, 237(2):426-432.
29
Schwertani A, Choi HY, Genest J. HDLs and the pathogenesis of atherosclerosis[J]. Curr Opin Cardiol, 2018, 33(3):311-316.
30
Cheng JM, Akkerhuis KM, Meilhac O, et al. Circulating osteoglycin and NGAL/MMP9 complex concentrations predict 1-year major adverse cardiovascular events after coronary angiography[J]. Arterioscler Thromb Vasc Biol, 2014, 34(5):1078-1084.
31
Kim WJ, Bae EM, Kang YJ, et al. Glucocorticoid‐induced tumour necrosis factor receptor family related protein (GITR) mediates inflammatory activation of macrophages that can destabilize atherosclerotic plaques[J]. Immunology, 2006, 119(3):421-429.
32
Surendran P, Drenos F, Young R, et al. Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension[J]. Nature Genetics, 2016, 48(10):1151-1161.
33
Mishiro T, Ishihara K, Hino S, et al. Architectural roles of multiple chromatin insulators at the human apolipoprotein gene cluster[J]. EMBO J, 2009, 28(9):1234-1245.
34
Chiu TF, Li CH, Chen CC, et al. Association of plasma concentration of small heat shock protein B7 with acute coronary syndrome[J]. Circ J, 2012, 76(9):2226-2233.
35
Du S, Jia Z, Zhong J, et al. TRPC5 in cardiovascular diseases[J]. Rev Cardiovasc Med, 2021, 22(1):127-135.
36
Birjmohun RS, Dallinga-Thie GM, Kuivenhoven JA, et al. Apolipoprotein A-II is inversely associated with risk of future coronary artery disease[J]. Circulation, 2007, 116(18):2029-2035.
37
Dehlin HM, Manteufel EJ, Monroe AL, et al. Substance P acting via the neurokinin-1 receptor regulates adverse myocardial remodeling in a rat model of hypertension[J]. Int J Cardiol, 2013, 168(5):4643-4651.
38
Izquierdo MC, Martin-Cleary C, Fernandez-Fernandez B, et al. CXCL16 in kidney and cardiovascular injury[J]. Cytokine Growth Factor Rev, 2014, 25(3):317-325.
39
Hitzel J, Lee E, Zhang Y, et al. Oxidized phospholipids regulate amino acid metabolism through MTHFD2 to facilitate nucleotide release in endothelial cells[J]. Nat Commun, 2018, 9(1):2292.
40
Yamada K, Watanabe A, Iwayama-Shigeno Y, et al. Evidence of association between gamma-aminobutyric acid type A receptor genes located on 5q34 and female patients with mood disorders[J]. Neurosci Lett, 2003, 349(1):9-12.
41
Li Y, Feng X, Ren H, et al. Low-dose ozone therapy improves sleep quality in patients with insomnia and coronary heart disease by elevating serum BDNF and GABA[J]. Bull Exp Biol Med, 2021, 170(4):493-498.
42
Henssen AG, Henaff E, Jiang E, et al. Genomic DNA transposition induced by human PGBD5[J]. Elife, 2015, 4:e10565.
43
Wu X, Gao H, Ke W, et al. VentX trans-activates p53 and p16ink4a to regulate cellular senescence[J]. J Biol Chem, 2011, 286(14):12693-12701.
44
Zhang D, Guan L, Li X. Bioinformatics analysis identifies potential diagnostic signatures for coronary artery disease[J]. J Int Med Res, 2020, 48(12):300060520979856.
[1] 赫兰, 杨泽堃, 张颖, 王玉东, 陈伟导, 王一同, 申锷. 双输入BCNN-ResNet模型对超声颈动脉斑块稳定性的分类诊断价值[J/OL]. 中华医学超声杂志(电子版), 2024, 21(02): 137-142.
[2] 王睿, 邓俊, 施廷鑫, 张志兆, 王成方, 张毅, 齐晓伟. FAM91A1 可能是乳腺癌患者的独立预后因子[J/OL]. 中华乳腺病杂志(电子版), 2024, 18(05): 274-280.
[3] 伍梦妮, 徐志华, 陈彦. DTNBP1基因在三阴性乳腺癌中的作用及其预后价值[J/OL]. 中华乳腺病杂志(电子版), 2024, 18(03): 158-168.
[4] 罗烨, 胡梦铃, 黄小凡, 林金鹏, 李竺蔓, 王少白. 支持向量机用于膝骨关节炎和韧带损伤的分类研究[J/OL]. 中华关节外科杂志(电子版), 2024, 18(02): 201-208.
[5] 李怡泉, 谢宇斌, 胡宏, 张燕茹, 陈图锋. 基于生物信息学分析HDAC8在结肠癌中的临床意义及其与免疫浸润的关系[J/OL]. 中华普通外科学文献(电子版), 2024, 18(04): 275-281.
[6] 黄俊龙, 李文双, 李晓阳, 刘柏隆, 陈逸龙, 丘惠平, 周祥福. 基于盆底彩超的人工智能模型在女性压力性尿失禁分度诊断中的应用[J/OL]. 中华腔镜泌尿外科杂志(电子版), 2024, 18(06): 597-605.
[7] 犹成亿, 尤恒, 叶东樊, 张雯, 刘禹, 王仁宇, 苏琳茜, 甘慧, 徐智. 基于3D Res U-Net-Faster RCNN 技术和CT 影像学特征的肺结节性质预测模型的建立[J/OL]. 中华肺部疾病杂志(电子版), 2024, 17(05): 673-679.
[8] 吴沛玲, 娄月妍, 张洪艳, 陈东方, 刘雪青, 赵丽芳, 薛姗, 蒋捍东. 线粒体相关基因在特发性肺纤维化中的分析[J/OL]. 中华肺部疾病杂志(电子版), 2024, 17(02): 178-184.
[9] 陈显育, 曾谣, 莫钊鸿, 翟航, 张广权, 钟造茂, 陈署贤. 生物信息学分析CETP基因在肝癌中表达及其对预后和免疫的影响[J/OL]. 中华肝脏外科手术学电子杂志, 2024, 13(02): 214-219.
[10] 陈健, 周静洁, 夏开建, 王甘红, 刘罗杰, 徐晓丹. 基于卷积神经网络实现结直肠息肉的实时检测与自动NICE分型(附视频)[J/OL]. 中华结直肠疾病电子杂志, 2024, 13(03): 217-228.
[11] 贾红艳, 王丹, 张冉冉, 马茜, 焦永红. 基于全外显子组测序探寻Möbius综合征发病机制的遗传学研究[J/OL]. 中华眼科医学杂志(电子版), 2024, 14(03): 146-154.
[12] 潘清, 葛慧青. 基于机械通气波形大数据的人机不同步自动监测方法[J/OL]. 中华重症医学电子杂志, 2024, 10(04): 399-403.
[13] 丁富贵, 吴泽涛, 董卫国. 家族性腺瘤性息肉病临床特征及生物信息学分析[J/OL]. 中华消化病与影像杂志(电子版), 2024, 14(06): 512-518.
[14] 孙铭远, 褚恒, 徐海滨, 张哲. 人工智能应用于多发性肺结节诊断的研究进展[J/OL]. 中华临床医师杂志(电子版), 2024, 18(08): 785-790.
[15] 曹磊, 邵轶普, 张志中, 王晨潮, 孙开文, 董阳, 闫东明, 李红伟, 杨波. 基于遗传基因的烟雾病与烟雾综合征生物信息学分析机制研究[J/OL]. 中华脑血管病杂志(电子版), 2024, 18(04): 350-356.
阅读次数
全文


摘要