摘要
阿尔茨海默病(Alzheimer's disease,AD)给社会带来了巨大的医疗和经济负担,寻找和发现其治疗药物有着重大的研究意义。本研究采用知识图谱嵌入在公开的药物再利用知识图谱(drug repurposing knowledge graph,DRKG)上研究了AD的药物重定位。首先,利用4种知识图谱嵌入模型,即TransE、DistMult、ComplEx和RotatE在DRKG上学习实体和关系的嵌入向量;随后使用3种经典的知识图谱评估指标评估和比较了这些模型的性能和学习到的嵌入向量的质量;根据评估比较的结果,选择利用RotatE模型进行链接预测,确定了16种有可能用于AD治疗的药物,其中谷胱甘肽、氟哌啶醇、辣椒素、槲皮素、雌二醇、葡萄糖、双硫仑、腺苷、帕罗西汀、紫杉醇、格列本脲、阿米替林已被前人的研究证实对于AD有潜在的治疗作用。研究结果表明,基于知识图谱嵌入的药物重定位研究有望为AD药物发现提供新的思路和方法,RotatE模型可以有效地整合DRKG的多源信息,进而很好地完成了AD药物重定位任务。本研究的源代码可以从https://github.com/LuYF-Lemon-love/AD-KGE获得。
阿尔茨海默病(Alzheimer's disease,AD)是一种常见的神经退行性疾病,无法治愈且不可逆
然而,研发一款新药往往用时漫长、耗资巨
KG是一种基于拓扑结构图存储知识的数据库。知识中的具体事物和抽象概念在KG中被表示为实体,实体之间的联系被表示为关系,进而知识被表示成格式为(头实体,关系,尾实体)的三元组。KG是一个由大量的三元组组成的有向图结构,图中的节点表示实体,边表示实体间的关系。
然而,许多KG规模巨大,如药物重定位知识图谱(drug repurposing knowledge graph,DRKG
近年来,研究人员提出了很多利用KG进行药物重定位的方法。Zeng
Wang
因此,本研究采用KGE模型,在DRKG上研究了AD药物重定位。

Figure 1 Diagram illustrating the workflow of our approach
DRK
为了实现在DRKG上学习实体和关系的嵌入向量,考虑到算力限制,仅研究和对比了4种经典且具有线性时间复杂度的KGE模型,即Trans
Trans
(1) |
如
DistMul
(2) |
其中,是关系r的对角矩阵。
由于DistMul
(3) |
其中,Re表示复数的实部,表示t的共轭。
受到TransE和欧拉恒等式的启发,Rotat
(4) |
其中,表示哈达玛积。
KGE模型可以通过链接预测技术预测KG中缺失的三元组,即给定(h,r,?)预测缺失的尾实体t,或者给定(?,r,t)预测缺失的头实体h。可以通过链接预测给出正确实体的排名。常使用3种经典指标来评估链接预测的性能:正确实体评分函数的平均排名(mean rank,MR
如果用和分别表示预测正确头实体和尾实体的排名,T表示需要评估的三元组集合,那么MR、MRR和Hits@N的具体的计算方法分别如
(7) |
(8) |
(9) |
在
由于DRKG结合了来自不同数据源的信息,本研究通过嵌入评估来定性验证KGE模型是否生成了有意义的实体和关系嵌入。理想的情况是,KGE模型能够学习到不同关系嵌入向量的差异之处和相同类型实体的相似之处。
本研究首先采用t分布随机近邻嵌入(t-distributed stochastic neighbor embedding,t-SNE
(10) |
在
接下来使用主成分分析将实体嵌入向量降到30
使用KGE模型做药物重定位时,将DrugBank中被FDA批准的药物作为候选药物(相对分子质量 ≥ 250,共8 104个),它们构成了头实体集合。选择DRKG中所有治疗关系作为链接预测的关系,共有DRUGBANK::treats::Compound:Disease,GNBR::T::Compound:Disease,Hetionet::CtD::Compound:Disease 3种,其中treats、T、CtD分别是DrugBank数据库、GNBR数据库、Hetionet数据库中的治疗关系。选择DRKG中全部AD实体作为尾实体集合,共有Disease::DOID:10652,Disease::MESH:C536599,Disease::MESH:D000544 3 种,其中Disease::DOID:10652是来自Hetionet数据源的AD实体,Disease::MESH:C536599和Disease::MESH:D000544是被映射到MESH ID的AD实体(其中Disease::MESH:C536599是无神经纤维缠结AD的实体)。将上面实体和关系集合进行格式为(h,r,t)排列组合(总共8 104 × 3 × 3 = 72 936种可能),然后计算所有组合评分函数的得分,最后选择得分前N的药物作为AD的治疗药物,其中N的值取决于不同KGE模型在测试集上的MR指标结果。
将DRKG的三元组按照90%、5%、5%的比例划分为训练集、验证集和测试集,分别为5 286 834个、293 713个和293 714个。
综合5个经典的KGE评估指标(即MR、MRR、Hits@1、Hits@3、Hits@10)的综合表现,在验证集上利用网格搜索所有模型的超参数(TransE_l1、TransE_l2、DistMult、ComplEx和RotatE)。所有模型的训练批处理大小和每个正例三元组使用的负例三元组的数量分别固定为4 096和256,学习率(learning rate, lr)则都从{0.01,0.05,0.1}中选择。由于RotatE模型实体维度是超参数嵌入维度(the embedding dimension,hidden_dim)的2倍,本研究选择将RotatE模型的hidden_dim固定为200,其他模型的hidden_dim则从{200, 400}中选择。对于超参数,TransE_l1、TransE_l2和RotatE从{6,12,18}中选择,而DistMult、ComplEx模型则从{50,125,200}中进行选择。
研究利用Zheng
Model | MRR | MR | Hits@1 | Hits@3 | Hits@10 |
---|---|---|---|---|---|
TransE_l1 | 0.530 | 62.64 | 0.412 | 0.606 | 0.740 |
TransE_l2 | 0.437 | 60.83 | 0.302 | 0.515 | 0.693 |
DistMult | 0.484 | 105.55 | 0.401 | 0.515 | 0.643 |
ComplEx | 0.621 | 112.74 | 0.537 | 0.673 | 0.768 |
RotatE | 0.614 | 63.51 | 0.515 | 0.681 | 0.780 |
The best results are in bold and the second best results are in underline
各个模型超参数的最佳配置是:对于TransE_l1,hidden_dim = 400, = 18,lr = 0.05;对于TransE_l2,hidden_dim = 400,=12,lr = 0.1;对于DistMult,hidden_dim = 400, = 50,lr = 0.1;对于ComplEx,hidden_dim = 400, = 50,lr = 0.1;对于RotatE,hidden_dim = 200, = 18,lr = 0.05。
鉴于DistMult模型在经典评估中并不出色的表现,本研究仅选择TransE_l1、TransE_l2、ComplEx和RotatE模型,利用最佳超参数,重新在整个DRKG上进行训练,并进一步进行模型的嵌入评估和AD药物重定位。

Figure 2 Distribution of relation embeddings in 2D euclidean space for 4 models
A: TransE_l1 embeddings; B: TransE_l2 embeddings; C: ComplEx embeddings; D: RotatE embeddings

Figure 3 Histogram of cosine similarity between relations for 4 models
A: TransE_l1 embeddings; B: TransE_l2 embeddings; C: ComplEx embeddings; D: RotatE embeddings

Figure 4 Distribution of entity embeddings in 2D euclidean space for 4 models
A: TransE_l1 embeddings; B: TransE_l2 embeddings; C: ComplEx embeddings; D: RotatE embeddings
综合KGE的经典评估和嵌入评估结果,本研究使用RotatE模型作为AD药物重定位的最终模型。在得分前10的药物列表中,只有第9名的药物没有被DRKG标注为对AD疾病实体有治疗关系,说明该方法能够正确表达DRKG中原有的三元组。
由于RotatE的MR指标结果是63.51,因此将得分前50、且没有被DRKG标注为对AD疾病实体有治疗关系的药物作为重定位得到的AD候选药物。考虑到其中得分排名在第23的西布曲明已退
Rank | Drug name | Literature support |
---|---|---|
9 | Glutathione |
The beneficial effect of many nutrients on the course of AD has been demonstrated. These include: glutathione, polyphenols, curcumin, coenzyme Q10, vitamins B6, B12, folic acid, unsaturated fatty acids, lecithin, UA, caffeine and some probiotic bacteri |
11 | Haloperidol |
Haloperidol inactivates AMPK and reduces tau phosphorylation in a tau mouse model of Alzheimer's diseas |
13 | Capsaicin |
In Alzheimer's disease, capsaicin reduces neurodegeneration and memory impairmen |
16 | Quercetin |
Quercetin has demonstrated antioxidant, anti-inflammatory, hypoglycemic, and hypolipidemic activities, suggesting therapeutic potential against type 2 diabetes mellitus (T2DM) and Alzheimer's disease (AD |
17 | Estradiol |
Mounting evidence indicates that the neurosteroid estradiol (17β-estradiol) plays a supporting role in neurogenesis, neuronal activity, and synaptic plasticity of AD. This effect may provide preventive and/or therapeutic approaches for A |
18 | Glucose |
Specifically, decreased O-GlcNAcylation levels by glucose deficiency alter mitochondrial functions and together contribute to Alzheimer's disease pathogenesi |
20 | Disulfiram |
Identification of disulfiram as a secretase-modulating compound with beneficial effects on Alzheimer's disease hallmark |
21 | Adenosine |
Emerging evidence suggests adenosine G protein-coupled receptors (GPCRs) are promising therapeutic targets for Alzheimer's diseas |
23 | Sibutramine |
In October 2010, Sibutramine was withdrawn from U.S |
29 | Paroxetine |
Paroxetine ameliorates prodromal emotional dysfunction and late-onset memory deficit in Alzheimer's disease mic |
31 | Cocaine | None |
39 | Paclitaxel |
In addition to NSAIDs, an anticancer drug, paclitaxel, has considerable potential as an AD treatmen |
41 | Cholesterol | None |
43 | Glyburide |
Our findings suggest that a pharmacologic approach to inhibit galanin in the brain, either by glibenclamide or pioglitazone might dramatically improve symptoms in Alzheimer's diseas |
44 | Staurosporine | None |
46 | Cortisone | None |
48 | Amitriptyline |
These results indicate that amitriptyline has significant beneficial actions in aged and damaged AD brains and that it shows promise as a tolerable novel therapeutic for the treatment of A |
'None' indicates no supporting literature found to date
本研究利用KGE模型研究了AD的药物重定位。先采用4种不同的KGE模型来学习DRKG的实体和关系的嵌入向量表示,通过比较确定使用RotatE模型基于链接预测技术发现AD的治疗药物。研究结果表明,RotatE能够有效整合DRKG的多源信息,完成AD药物重定位任务:共确定了16种可重新利用的药物,其中12种已被前人研究证实对于AD的治疗有着潜在的积极意义。
本研究使用的数据集是涉及13种实体和107种关系、包含5 874 261个三元组的DRKG。相比于仅利用单一药物靶点相互作用或利用单一疾病相关三元组而构建的KG,DRKG包含了各种各样的生物信息,会使AD重定位的结果更加全面。Wang
但是,在对诸如DRKG这种包含多实体类型和多关系类型的KG进行嵌入时,模型的训练与选择却是一个挑战。本研究通过使用多种KGE模型并使用多种评估方法来综合比较,发现与Nian
本研究结果表明,基于大型的多实体类型和多关系类型的KG,如DRKG,进行药物重定位研究,有着可观的应用场景,可为药物研发人员提供有意义的参考信息。不过,DRKG没有将所有的疾病都映射到统一的ID空间,这可能会对药物重定位的效果产生一定的影响。未来,我们将研究实体对齐技术,以实现将多种数据源的实体映射到统一的命名空间中,进而使得KGE模型能学习到更好的嵌入向量。

References
Nian Y, Hu XY, Zhang R, et al. Mining on Alzheimer's diseases related knowledge graph to identity potential AD-related semantic triples for drug repurposing[J]. BMC Bioinformatics, 2022, 23(Suppl 6): 407. [百度学术]
Moya-Alvarado G, Gershoni-Emek N, Perlson E, et al. Neurodegeneration and Alzheimer's disease (AD). What can proteomics tell us about the Alzheimer's brain[J]? Mol Cell Proteomics, 2016, 15(2): 409-425. [百度学术]
Ren RJ,Yin P,Wang ZH,et al.China Alzheimer's disease report 2021[J].J Diagn Concept Pract (诊断学理论与实践),2021,20(4):317-337. [百度学术]
Jia JP, Wei CB, Chen SQ, et al. The cost of Alzheimer's disease in China and re-estimation of costs worldwide[J]. Alzheimers Dement, 2018, 14(4): 483-491. [百度学术]
Avorn J. The $2.6 billion pill: methodologic and policy considerations[J]. N Engl J Med, 2015, 372(20): 1877-1879. [百度学术]
Zhang YS, Yang ZJ, Bao XF, et al. Progress of clinical research on drug repurposing for Alzheimer's disease[J]. Chin J Med Chem (中国药物化学杂志), 2022, 32(5): 372-389. [百度学术]
Wang CC, Li W, Shi ZX. Research progress on new use of old drugs[J]. World Clin Drugs (世界临床药物), 2021, 42(8): 699-704. [百度学术]
Zhang W, Gu F, Fu YK, et al. Progress in research on drug repositioning in new drug research and development[J]. Anim Husb Vet Med (畜牧与兽医), 2021, 53(12): 123-127. [百度学术]
Wang SD, Du ZZ, Ding M, et al. KG-DTI: a knowledge graph based deep learning method for drug-target interaction predictions and Alzheimer's disease drug repositions[J]. Appl Intell, 2022, 52(1): 846-857. [百度学术]
Ioannidis VN.DRKG - drug repurposing knowledge graph for Covid-19[EB/OL].(2021-07-12)[2023-03-31].https://github.com/gnn4dr/DRKG/. [百度学术]
Bordes A, Usunier N, Garcia-Durán A, et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. New York: ACM, 2013: 2787-2795. [百度学术]
Yang BS,Yih WT,He XD,et al.Embedding entities and relations for learning and inference in knowledge bases[J].arXiv,2015:1412.6575. [百度学术]
Trouillon T,Welbl J,Riedel S,et al.Complex embeddings for simple link prediction[J].arXiv,2016:1606.06357. [百度学术]
Sun ZQ,Deng ZH,Nie JY,et al.RotatE: knowledge graph embedding by relational rotation in complex space[J].arXiv,2019:1902.10197. [百度学术]
Zeng XX, Song X, Ma TF, et al. Repurpose open data to discover therapeutics for COVID-19 using deep learning[J]. J Proteome Res, 2020, 19(11): 4624-4636. [百度学术]
Zhang R, Hristovski D, Schutte D, et al. Drug repurposing for COVID-19 via knowledge graph completion[J]. J Biomed Inform, 2021, 115: 103696. [百度学术]
Li ZX. Repositioning drugs for Parkinson's disease based on knowledge graph[J]. Inf Technol Informatization (信息技术与信息化), 2022(7): 28-32. [百度学术]
Han X,Cao SL,Lv X,et al.OpenKE: an open toolkit for knowledge embedding[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.Brussels,Belgium:Association for Computational Linguistics,2018:139-144. [百度学术]
Maaten LVD,Hinton G.Visualizing data using t-SNE[J].J Machine Learn Res,2008,9(86):2579-2605. [百度学术]
Zheng Da,Song X,Ma C,et al.DGL-KE: training knowledge graph embeddings at scale[J].arXiv,2020:2004.08532. [百度学术]
U. S. Food and Drug Administration.FDA drug safety communication: FDA recommends against the continued use of Meridia (sibutramine)[EB/OL].(2018-02-06)[2023-04-03].https://www.fda.gov/drugs/drug-safety-and-availability/fda-drug-safety-communication-fda-recommends-against-continued-use-meridia-sibutramine. [百度学术]
Śliwińska S, Jeziorek M. The role of nutrition in Alzheimer's disease[J]. Rocz Panstw Zakl Hig, 2021, 72(1): 29-39. [百度学术]
Koppel J, Jimenez H, Adrien L, et al. Haloperidol inactivates AMPK and reduces tau phosphorylation in a tau mouse model of Alzheimer's disease[J]. Alzheimers Dement, 2016, 2(2): 121-130. [百度学术]
Pasierski M, Szulczyk B. Beneficial effects of capsaicin in disorders of the central nervous system[J]. Molecules, 2022, 27(8): 2484. [百度学术]
Zu GX, Sun KY, Li L, et al. Mechanism of quercetin therapeutic targets for Alzheimer disease and type 2 diabetes mellitus[J]. Sci Rep, 2021, 11(1): 22959. [百度学术]
Sahab-Negah S, Hajali V, Moradi HR, et al. The impact of estradiol on neurogenesis and cognitive functions in Alzheimer's disease[J]. Cell Mol Neurobiol, 2020, 40(3): 283-299. [百度学术]
Huang CW, Rust NC, Wu HF, et al. Altered O-GlcNAcylation and mitochondrial dysfunction, a molecular link between brain glucose dysregulation and sporadic Alzheimer's disease[J]. Neural Regen Res, 2023, 18(4): 779-783. [百度学术]
Reinhardt S, Stoye N, Luderer M, et al. Identification of disulfiram as a secretase-modulating compound with beneficial effects on Alzheimer's disease hallmarks[J]. Sci Rep, 2018, 8(1): 1329. [百度学术]
Trinh PNH, Baltos JA, Hellyer SD, et al. Adenosine receptor signalling in Alzheimer's disease[J]. Purinergic Signal, 2022, 18(3): 359-381. [百度学术]
Ai PH, Chen S, Liu XD, et al. Paroxetine ameliorates prodromal emotional dysfunction and late-onset memory deficit in Alzheimer's disease mice[J]. Transl Neurodegener, 2020, 9(1): 18. [百度学术]
Lehrer S,Rheinstein PH.Transspinal delivery of drugs by transdermal patch back-of-neck for Alzheimer's disease: a new route of administration[J].Discov Med,2019,27(146):37-43. [百度学术]
Baraka A, ElGhotny S. Study of the effect of inhibiting galanin in Alzheimer's disease induced in rats[J]. Eur J Pharmacol, 2010, 641(2/3): 123-127. [百度学术]
Chadwick W, Mitchell N, Caroll J, et al. Amitriptyline-mediated cognitive enhancement in aged 3 × Tg Alzheimer's disease mice is associated with neurogenesis and neurotrophic activity[J]. PLoS One, 2011, 6(6): e21660. [百度学术]
Feng JM. Clinical effect analysis of quetiapine combined with haloperidol in the treatment of schizophrenia in acute stage[J]. Med Forum (基层医学论坛), 2022, 26(20): 37-39. [百度学术]
Jiang YL. Research progress of Paroxetine combined with other therapies in the treatment of Major Depressive Disorder(MDD)[J]. Chin J Conval Med (中国疗养医学), 2021, 30(9): 919-923. [百度学术]
Kim L. A brief review of the pharmacology of amitriptyline and clinical outcomes in treating fibromyalgia[J]. Biomedicines, 2017, 5(2): 24. [百度学术]
Liu SC, Fu Q, Peng QH, et al. Research progress on the role and mechanism of CART peptide in central nervous system[J]. J Nanchang Univ Med Sci (南昌大学学报 医学版), 2022, 62(5): 76-80. [百度学术]
Gu GJ, Wu D, Lund H, et al. Elevated MARK2-dependent phosphorylation of tau in Alzheimer's disease[J]. J Alzheimers Dis, 2013, 33(3): 699-713. [百度学术]