切换至 "中华医学电子期刊资源库"

中华细胞与干细胞杂志(电子版) ›› 2023, Vol. 13 ›› Issue (01) : 19 -26. doi: 10.3877/cma.j.issn.2095-1221.2023.01.003

论著

利用随机森林联合人工神经网络基于外周血细胞易感基因建立冠心病诊断模型
谢恩睿1, 段一璇1, 刘畅1, 邓捷1,()   
  1. 1. 710000 西安交通大学第二附属医院心血管内科
  • 收稿日期:2022-08-20 出版日期:2023-02-01
  • 通信作者: 邓捷
  • 基金资助:
    西安交通大学医学"基础-临床"融合创新项目(YXJLRH2022073)

Construction of diagnosis model for coronary atherosclerosis heart disease using random forest and artificial neural network based on susceptibility genes in peripheral blood cells

Enrui Xie1, Yixuan Duan1, Chang Liu1, Jie Deng1,()   

  1. 1. Department of Cardiovascular Medicine, the Second Affiliated Hospital of Xi'an Jiaotng University, Xi'an 710000, China
  • Received:2022-08-20 Published:2023-02-01
  • Corresponding author: Jie Deng
引用本文:

谢恩睿, 段一璇, 刘畅, 邓捷. 利用随机森林联合人工神经网络基于外周血细胞易感基因建立冠心病诊断模型[J]. 中华细胞与干细胞杂志(电子版), 2023, 13(01): 19-26.

Enrui Xie, Yixuan Duan, Chang Liu, Jie Deng. Construction of diagnosis model for coronary atherosclerosis heart disease using random forest and artificial neural network based on susceptibility genes in peripheral blood cells[J]. Chinese Journal of Cell and Stem Cell(Electronic Edition), 2023, 13(01): 19-26.

目的

运用生物信息学方法联合随机森林和人工神经网络(ANN)筛选冠心病外周血细胞易感基因并构建冠心病诊断模型,为临床提供筛查冠心病潜在的分子生物标志物。

方法

从GEO数据库中下载3个基因表达谱数据(GSE20680、GSE20681和GSE12288),基于GSE20680进行差异表达基因的筛选、GO和KEGG富集分析,然后运用随机森林的机器学习算法对筛选到的差异表达基因进行关键基因的获取,最后综合利用这3个数据集建立1个训练集和2个测试集分别进行ANN诊断模型的构建和性能的验证。

结果

利用GEO数据库中得到的基因表达谱数据,基于随机森林的机器学习算法从284个差异表达基因中筛选出21个与冠心病相关的关键基因,利用ANN计算关键基因的权重,成功地构建冠心病诊断模型,最后利用2个测试集对该诊断模型的性能进行验证,AUC均较高(分别为0.9024和0.8153)。

结论

本研究筛选出21个冠心病相关的基因生物标志物,并建立冠心病诊断模型,该模型对冠心病有较好的分类效果,有助于冠心病筛查和早期临床诊断。

Objective

Our study aims to find susceptibility genes from peripheral blood cells as potential molecular biomarkers of coronary heart disease (CHD) and to create a diagnosis model using bioinformatics combined with random forest (RF) and artificial neural network (ANN) .

Methods

We downloaded three gene expression profiles (GSE20680, GSE20681, GSE12288) from Gene Expression Omnibus (GEO) database. Then we performed analyses of differential expression, gene ontology terms, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways based on GSE20680. Next, the RF was used further to obtain the key genes from the differentially expressed genes. Finally, we set up a training set to construct the diagnostic model using ANN and two test sets to verify the diagnostic efficacy of the model by comprehensively merging the three datasets.

Results

Using gene expression profiles in the GEO database, we identified 21 key genes from 284 differentially expressed genes by RF, and a new diagnostic model of CHD was also successfully constructed by using ANN to calculate the weight of key genes. Finally, two test sets were used to verify the diagnostic model's performance, and the AUC values were high (0.9024 and 0.8153 respectively) .

Conclusion

We identified 21 potential gene biomarkers of CHD and established a novel diagnostic model which shows a good result in the classification of CHD, and it may be helpful to CHD screening and early clinical diagnosis.

表1 数据集特征
图1 研究流程图
图2 GSE20680中DEG的火山图注:横坐标为logFC,纵坐标为-log10 (P值);每个点代表1个基因,红点是冠心病组与正常样本相比表达上调的基因,蓝点是表达下调的基因
图3 差异表达基因GO富集分析气泡图注:图中横轴是基因百分比,指每1个GO注释上基因占所有差异基因的百分比;纵轴是富集出来的GO条目;点的大小表示基因数;点的颜色越接近红色,代表P值越小,越接近蓝色,代表P值越大
图4 284个差异表达基因KEGG富集的圈图注:基因列在左边,上调基因呈棕色,下调基因呈浅蓝色;圈图中的连接关系表示DEG所属的KEGG通路
图5 RF筛选冠心病特征候选基因注:a图为RF模型中变量个数(mtry)和相应的袋外错误率的散点图;b图为决策树数目对错误率的影响;横轴是决策树的数目,纵轴是错误率;c图为RF模型中前30个基因基于MeanDecreaseGini排序;d图为21个特征候选基因重要性直方图;横轴代表基因,纵轴是重要性;e图为GSE20680中21个基因的表达热图;图中行代表基因,列代表样本,对表达值进行了归一化处理,热图上方的条带红色为CHD组,蓝色为对照组
图6 基于21个基因构建的CHD-ANN诊断模型图注:具有1个输入层,1个隐藏层(包含5个神经元)和1个输出层的神经网络拓扑
图7 ANN模型基于2个测试集绘制的ROC曲线注:a图为ANN模型在测试集1的验证情况;b图为ANN模型在测试集2的验证情况
1
Lozano R, Naghavi M, Foreman K, et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010[J]. Lancet, 2012, 380(9859):2095-2128.
2
Turk-Adawi K, Sarrafzadegan N, Fadhil I, et al. Cardiovascular disease in the Eastern Mediterranean region: epidemiology and risk factor burden[J]. Nat Rev Cardiol, 2018, 15(2):106-119.
3
胡盛寿, 高润霖, 刘力生, 等. 《中国心血管病报告2018》概要[J]. 中国循环杂志, 2019, 34(3):209-220.
4
Khera AV, Emdin CA, Drake I, et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease[J]. N Engl J Med, 2016, 375(24):2349-2358.
5
Nikpay M, Goel A, Won HH, et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease[J]. Nat Genet, 2015, 47(10):1121-1130.
6
Nelson CP, Goel A, Butterworth AS, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease[J]. Nat Genet, 2017, 49(9):1385-1391.
7
Beineke P, Fitch K, Tao H, et al. A whole blood gene expression-based signature for smoking status[J]. BMC Med Genomics, 2012, 5:58.
8
Khera AV, Kathiresan S. Genetics of coronary artery disease: discovery, biology and clinical translation[J]. Nat Rev Genet, 2017, 18(6):331-344.
9
Lebedev AV, Westman E, Van Westen GJ, et al. Random forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness[J]. Neuroimage Clin, 2014, 6:115-125.
10
Toth R, Schiffmann H, Hube-Magg C, et al. Random forest-based modelling to detect biomarkers for prostate cancer progression[J]. Clin Epigenetics, 2019, 11(1):148.
11
Kong Y, Yu T. A Deep Neural Network model using random forest to extract feature representation for gene expression data classification[J]. Sci Rep, 2018, 8(1):16477.
12
Sinnaeve PR, Donahue MP, Grass P, et al. Gene expression patterns in peripheral blood correlate with the extent of coronary artery disease[J]. PLoS One, 2009, 4(9):e7037.
13
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods[J]. Biostatistics, 2007, 8(1):118-127.
14
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies[J]. Nucleic Acids Res, 2015, 43(7):e47.
15
Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters[J]. OMICS, 2012, 16(5):284-287.
16
Maouche S, Schunkert H. Strategies beyond genome-wide association studies for atherosclerosis[J]. Arterioscler Thromb Vasc Biol, 2012, 32(2):170-181.
17
Gasser T C. Biomechanical rupture risk assessment: a consistent and objective decision-making tool for abdominal aortic aneurysm patients[J]. Aorta (Stamford), 2016, 4(2):42-60.
18
Douguet D, Patel A, Xu A, et al. Piezo ion channels in cardiovascular mechanobiology[J]. Trends Pharmacol Sci, 2019, 40(12):956-970.
19
Zhao C, Ikeda S, Arai T, et al. Association of the RYR3 gene polymorphisms with atherosclerosis in elderly Japanese population[J]. BMC Cardiovasc Disord, 2014, 14:6.
20
Da SI, Barroso M, Moura T, et al. Endothelial aquaporins and hypomethylation: potential implications for atherosclerosis and cardiovascular disease[J]. Int J Mol Sci, 2018, 19(1):130.
21
Wang Y, Liu Z, Li C, et al. Drug target prediction based on the herbs components: the study on the multitargets pharmacological mechanism of qishenkeli acting on the coronary heart disease[J]. Evid Based Complement Alternat Med, 2012, 2012:698531.
22
Zhou T, Li S, Yang L, et al. microRNA-363-3p reduces endothelial cell inflammatory responses in coronary heart disease via inactivation of the NOX4-dependent p38 MAPK axis[J]. Aging (Albany NY), 2021, 13(8):11061-11082.
23
van Venrooij NA, Pereira RC, Tintut Y, et al. FGF23 protein expression in coronary arteries is associated with impaired kidney function[J]. Nephrol Dial Transplant, 2014, 29(8):1525-1532.
24
Iakoubova OA, Tong CH, Rowland CM, et al. Association of the Trp719Arg polymorphism in kinesin-like protein 6 with myocardial infarction and coronary heart disease in 2 prospective trials: the CARE and WOSCOPS trials[J]. J Am Coll Cardiol, 2008, 51(4):435-443.
25
Shimabukuro M. Serotonin and atheroscelotic cardiovascular disease[J]. J Atheroscler Thromb, 2022, 29(3):315-316.
26
Al-Massadi O, Quiñones M, Clasadonte J, et al. MCH regulates SIRT1/FoxO1 and reduces POMC neuronal activity to induce hyperphagia, adiposity, and glucose intolerance[J]. Diabetes, 2019, 68(12):2210-2222.
27
Climent B, Santiago E, Sánchez A, et al. Metabolic syndrome inhibits store-operated Ca2+ entry and calcium-induced calcium-release mechanism in coronary artery smooth muscle[J]. Biochem Pharmacol, 2020, 182:114222.doi: 10.1016/j.bcp.2020.114222.
28
Müller II, Müller K AL, Karathanos A, et al. Impact of counterbalance between macrophage migration inhibitory factor and its inhibitor Gremlin-1 in patients with coronary artery disease[J]. Atherosclerosis, 2014, 237(2):426-432.
29
Schwertani A, Choi HY, Genest J. HDLs and the pathogenesis of atherosclerosis[J]. Curr Opin Cardiol, 2018, 33(3):311-316.
30
Cheng JM, Akkerhuis KM, Meilhac O, et al. Circulating osteoglycin and NGAL/MMP9 complex concentrations predict 1-year major adverse cardiovascular events after coronary angiography[J]. Arterioscler Thromb Vasc Biol, 2014, 34(5):1078-1084.
31
Kim WJ, Bae EM, Kang YJ, et al. Glucocorticoid‐induced tumour necrosis factor receptor family related protein (GITR) mediates inflammatory activation of macrophages that can destabilize atherosclerotic plaques[J]. Immunology, 2006, 119(3):421-429.
32
Surendran P, Drenos F, Young R, et al. Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension[J]. Nature Genetics, 2016, 48(10):1151-1161.
33
Mishiro T, Ishihara K, Hino S, et al. Architectural roles of multiple chromatin insulators at the human apolipoprotein gene cluster[J]. EMBO J, 2009, 28(9):1234-1245.
34
Chiu TF, Li CH, Chen CC, et al. Association of plasma concentration of small heat shock protein B7 with acute coronary syndrome[J]. Circ J, 2012, 76(9):2226-2233.
35
Du S, Jia Z, Zhong J, et al. TRPC5 in cardiovascular diseases[J]. Rev Cardiovasc Med, 2021, 22(1):127-135.
36
Birjmohun RS, Dallinga-Thie GM, Kuivenhoven JA, et al. Apolipoprotein A-II is inversely associated with risk of future coronary artery disease[J]. Circulation, 2007, 116(18):2029-2035.
37
Dehlin HM, Manteufel EJ, Monroe AL, et al. Substance P acting via the neurokinin-1 receptor regulates adverse myocardial remodeling in a rat model of hypertension[J]. Int J Cardiol, 2013, 168(5):4643-4651.
38
Izquierdo MC, Martin-Cleary C, Fernandez-Fernandez B, et al. CXCL16 in kidney and cardiovascular injury[J]. Cytokine Growth Factor Rev, 2014, 25(3):317-325.
39
Hitzel J, Lee E, Zhang Y, et al. Oxidized phospholipids regulate amino acid metabolism through MTHFD2 to facilitate nucleotide release in endothelial cells[J]. Nat Commun, 2018, 9(1):2292.
40
Yamada K, Watanabe A, Iwayama-Shigeno Y, et al. Evidence of association between gamma-aminobutyric acid type A receptor genes located on 5q34 and female patients with mood disorders[J]. Neurosci Lett, 2003, 349(1):9-12.
41
Li Y, Feng X, Ren H, et al. Low-dose ozone therapy improves sleep quality in patients with insomnia and coronary heart disease by elevating serum BDNF and GABA[J]. Bull Exp Biol Med, 2021, 170(4):493-498.
42
Henssen AG, Henaff E, Jiang E, et al. Genomic DNA transposition induced by human PGBD5[J]. Elife, 2015, 4:e10565.
43
Wu X, Gao H, Ke W, et al. VentX trans-activates p53 and p16ink4a to regulate cellular senescence[J]. J Biol Chem, 2011, 286(14):12693-12701.
44
Zhang D, Guan L, Li X. Bioinformatics analysis identifies potential diagnostic signatures for coronary artery disease[J]. J Int Med Res, 2020, 48(12):300060520979856.
[1] 李锐颖, 危望, 王达志, 时志斌. 深度学习技术在膝关节疾病中的研究现状与展望[J]. 中华关节外科杂志(电子版), 2023, 17(05): 722-725.
[2] 范帅华, 郭伟, 郭军. 基于机器学习的决策树算法在血流感染预后预测中应用现状及展望[J]. 中华实验和临床感染病杂志(电子版), 2023, 17(05): 289-293.
[3] 李越洲, 张孔玺, 李小红, 商中华. 基于生物信息学分析胃癌中PUM的预后意义[J]. 中华普通外科学文献(电子版), 2023, 17(06): 426-432.
[4] 张圣平, 邓琼, 张颖, 张建文, 梁辉, 王铸. 孤儿核受体HNF4α在肾透明细胞癌中的表达及意义[J]. 中华腔镜泌尿外科杂志(电子版), 2023, 17(06): 627-632.
[5] 李晓阳, 刘柏隆, 周祥福. 大数据及人工智能对女性盆底功能障碍性疾病的诊断及风险预测[J]. 中华腔镜泌尿外科杂志(电子版), 2023, 17(06): 549-552.
[6] 邢晓伟, 刘雨辰, 赵冰, 王明刚. 基于术前腹部CT的卷积神经网络对腹壁切口疝术后复发预测价值[J]. 中华疝和腹壁外科杂志(电子版), 2023, 17(06): 677-681.
[7] 邱静, 黄庆. HJURP在肺腺癌组织中高表达并与患者不良预后相关性[J]. 中华肺部疾病杂志(电子版), 2023, 16(04): 495-499.
[8] 陈安, 冯娟, 杨振宇, 杜锡林, 柏强善, 阴继凯, 臧莉, 鲁建国. 基于生物信息学分析CCN4在肝细胞癌中表达及其临床意义[J]. 中华肝脏外科手术学电子杂志, 2023, 12(06): 702-707.
[9] 张维志, 刘连新. 基于生物信息学分析IPO7在肝癌中的表达及意义[J]. 中华肝脏外科手术学电子杂志, 2023, 12(06): 694-701.
[10] 韩冰, 顾劲扬. 深度学习神经网络在肝癌诊疗中的研究及应用前景[J]. 中华肝脏外科手术学电子杂志, 2023, 12(05): 480-485.
[11] 葛云鹏, 崔红元, 宋京海. 人工智能在原发性肝癌诊断、治疗及预后中的应用[J]. 中华肝脏外科手术学电子杂志, 2023, 12(04): 367-371.
[12] 郭震天, 张宗明, 赵月, 刘立民, 张翀, 刘卓, 齐晖, 田坤. 机器学习算法预测老年急性胆囊炎术后住院时间探索[J]. 中华临床医师杂志(电子版), 2023, 17(9): 955-961.
[13] 王苏贵, 皇立媛, 姜福金, 吴自余, 张先云, 李强, 严大理. 异质性细胞核核糖蛋白A2B1在前列腺癌中的作用及其靶向中药活性成分筛选研究[J]. 中华临床医师杂志(电子版), 2023, 17(06): 731-736.
[14] 张敏洁, 王雅晳, 段莎莎, 施依璐, 付文艳, 赵海玥, 张小杉. 基于GEO数据库和生物信息学分析筛选大鼠心肌缺血再灌注损伤相关潜在通路和靶点[J]. 中华临床医师杂志(电子版), 2023, 17(04): 438-445.
[15] 王俊杰, 尹晓亮, 刘二腾, 陆军, 祁鹏, 胡深, 杨希孟, 陈鲲鹏, 张东, 王大明. 机器学习对预测颈内动脉非急性闭塞患者血管内再通术成功的潜在价值[J]. 中华脑血管病杂志(电子版), 2023, 17(05): 464-470.
阅读次数
全文


摘要