高级检索

人工智能在抗癌肽研发中的应用与挑战

张志星, 邓华, 唐贇

张志星,邓华,唐贇. 人工智能在抗癌肽研发中的应用与挑战[J]. 中国药科大学学报,2024,55(3):347 − 356. DOI: 10.11665/j.issn.1000-5048.2024040201
引用本文: 张志星,邓华,唐贇. 人工智能在抗癌肽研发中的应用与挑战[J]. 中国药科大学学报,2024,55(3):347 − 356. DOI: 10.11665/j.issn.1000-5048.2024040201
ZHANG Zhixing, DENG Hua, TANG Yun. Applications and challenges of artificial intelligence in the development of anticancer peptides[J]. J China Pharm Univ, 2024, 55(3): 347 − 356. DOI: 10.11665/j.issn.1000-5048.2024040201
Citation: ZHANG Zhixing, DENG Hua, TANG Yun. Applications and challenges of artificial intelligence in the development of anticancer peptides[J]. J China Pharm Univ, 2024, 55(3): 347 − 356. DOI: 10.11665/j.issn.1000-5048.2024040201

人工智能在抗癌肽研发中的应用与挑战

基金项目: 国家自然科学基金项目(No. U23A20530)
详细信息
    作者简介:

    唐贇,博士,华东理工大学药学院教授、博士生导师。1996年博士毕业于中国科学院上海药物研究所,随后在瑞典、美国、加拿大等地留学工作8年,2004年回国任复旦大学教授,同年参与华东理工大学药学院创建,曾任副院长10年,目前为药学学科负责人。2005年入选上海市首批“浦江人才计划”,2008年入选教育部“新世纪优秀人才支持计划”。有超过30年的计算机辅助药物设计经验,承担过国家自然科学基金等20余项科研项目,已发表SCI论文280余篇,获得计算机软件著作权14项,授权专利6项,主编、参编教材、专著和译著10余本,已为制药行业培养硕博人才100多名。曾获得上海市育才奖、宝钢优秀教师奖、上海市教学成果一等奖、药明康德生命化学研究奖,编著的《药物设计学》获得2022年中国石油和化学工业优秀图书奖-优秀教材一等奖

    通讯作者:

    唐贇: Tel:021-64251052 E-mail:ytang234@ecust.edu.cn

  • 中图分类号: TP181;R914.2

Applications and challenges of artificial intelligence in the development of anticancer peptides

Funds: This study was supported by the National Natural Science Foundation of China (No. U23A20530)
  • 摘要:

    抗癌肽(anticancer peptides,ACPs)因其高效低毒和高选择性优势成为研究焦点,而基于人工智能的ACPs识别和设计方法较传统实验方法成本低廉、成功率高且能够探索更广阔的序列空间。本文重点介绍了人工智能技术在ACPs生成和识别过程中的应用,包括深度生成模型探索新型ACPs设计以及基于机器学习和深度学习的ACPs识别方法。此外,文章还讨论了当前研究中存在的模型可复现性和可解释性不足、缺乏经过实验验证的阴性数据等挑战,并对未来研究方向提出展望,以期为ACPs的研发提供新思路。

    Abstract:

    Anticancer peptides (ACPs) have become a focal point of research due to their high efficacy, low toxicity, and high selectivity. Methods of ACP identification and design based on artificial intelligence (AI) surpass traditional experimental techniques in cost-efficiency, success rate, and the ability to investigate a broader sequence space. This article highlights the application of AI technology in the generation and identification of ACPs, including the exploration of new ACP design through deep generative models and ACP identification methods based on machine learning and deep learning. Furthermore, it discusses challenges in current research, such as insufficient model reproducibility and interpretability, and a lack of experimentally validated negative data. Future research directions are proposed to provide new insights for the development of anticancer peptides, aiming to enhance the understanding and development of ACPs.

  • 图  1   抗癌肽(anticancer peptides,ACPs)的结构与性质和功能的关系

    图  2   ACPs的深度生成模型构建

    LSTM: 长短期记忆神经网络;VAE:变分自编码器;GAN:生成对抗网络

    图  3   ACPs识别模型构建

    ML:机器学习;RF:随机森林;SVM:支持向量机;LR:逻辑回归;KNN:K-近邻;DL:深度学习;GNN:图神经网络;CNN:卷积神经网络;BERT: 双向变换器模型

    表  1   可获得ACPs的数据库

    生物活性肽类型 数据库 描 述 开发年份 更新时间 唯一条目数/
    $ \text{总条目数}^{\rm{a}} $
    ACPs数量
    ACPs CancerPPD[11] 经实验验证的ACPs和蛋白质数据库 2015 未获得相关信息 3612 3491
    ApInAPDB[12] 凋亡诱导ACPs数据库 2022 未获得相关信息 818 818
    TumorHoPe[19] 肿瘤归巢肽综合数据库 2012 未获得相关信息 704/744 744
    AMPs和ACPs APD3[14] AMPs数据库(含ACPs) 2016 2024年1月 3940 290
    DADP[15] 防御肽数据库,由AMPs和ACPs组成 2012 未获得相关信息 1923/2571 108
    DBAASP v.3[16] AMPs数据库(含ACPs) 2020 未获得相关信息 21509 3599
    DRAMP 3.0[17] AMPs数据库(含ACPs) 2022 2023年11月 22528 163
    LAMP2[18] AMPs和ACPs数据库 2013 2016年12月 23253 未计数
    dbAMP 2.0[20] 用于探索具有转录组和蛋白质组数据的功能活性和理化特性的AMPs的综合数据库 2019 2021年11月 28709 2290
    CAMPR3[21] AMPs数据库(含ACPs) 2015 未获得相关信息 10247 未计数
    YADAMP[22] AMPs数据库(含ACPs) 2012 2013年3月 2525 未计数
    不限 SATPdb[23] 带注释的肽数据库,由 20 个肽数据库和2个数据集组成。涵盖ACPs、抗寄生虫肽、细胞穿透肽、毒性肽等10多个类别的肽数据 2015 未获得相关信息 19192 1099
    StraPep[24] 已知生物活性肽的结构数据库 2018 未获得相关信息 1312/3791 未计数
    PlantPepDB[25] 植物肽数据库 2020 未获得相关信息 3848 未计数
    THPdb[26] FDA 批准的治疗肽和蛋白质数据库 2017 未获得相关信息 852 未计数
    BioPepDB[27] 食品来源的生物活性肽数据库 2018 2018年1月 4807 635
    MBPDB[28] 源自牛奶蛋白质的生物活性肽数据库 2017 2024年1月 691 18
    a:此处统计的是数据库中收录的所有生物活性肽数据,而不仅限于ACPs数据;访问数据库及统计数据的日期为2024年3月20日;AMPs:抗菌肽(antimicrobial peptides)
    下载: 导出CSV

    表  2   现有的ACPs基准数据集

    数据集 序列同一性a/% ACPs数量 non-ACPs数量 总数
    TY_MD[29] 100 225 2250 2475
    TY_AD[29] 100 225 1372 1597
    TY_BD[29] 100 225 225 450
    TY_IND[29] 100 50 50 100
    ZOH[30] 90 138 206 344
    SA_TRAIN[31] 100 217 3979 4196
    SA_IND[31] 100 40 40 80
    SA_RAND[31] 100 - 2000 2000
    Chen_S1[32] 100 138 206 344
    Chen_S2[32] 100 150 150 300
    H-C[5] 100 126 205 331
    LEE[5] 100 422 422 844
    a:在构建数据集时,研究者去除了数据集中两两序列同一性超过90%或100%的肽段。
    下载: 导出CSV
  • [1]

    Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3): 209-249. doi: 10.3322/caac.21660

    [2]

    International Agency for Research on Cancer. Cancer Today[EB/OL]//gco. iarc. who. int. (2024-02-01)[2024-03-05].https://gco.iarc.who.int/today/en/dataviz/tables?mode=population&types=1.

    [3]

    Han BF, Zheng RS, Zeng HM, et al. Cancer incidence and mortality in China, 2022[J]. J Natl Cancer Cent, 2024, 4(1): 47-53. doi: 10.1016/j.jncc.2024.01.006

    [4]

    Norouzi P, Mirmohammadi M, Houshdar Tehrani MH. Anticancer peptides mechanisms, simple and complex[J]. Chem Biol Interact, 2022, 368: 110194. doi: 10.1016/j.cbi.2022.110194

    [5]

    Manavalan B, Basith S, Shin TH, et al. MLACP: machine-learning-based prediction of anticancer peptides[J]. Oncotarget, 2017, 8(44): 77121-77136. doi: 10.18632/oncotarget.20365

    [6]

    Huang YB, Wang XF, Wang HY, et al. Studies on mechanism of action of anticancer peptides by modulation of hydrophobicity within a defined structural framework[J]. Mol Cancer Ther, 2011, 10(3): 416-426. doi: 10.1158/1535-7163.MCT-10-0811

    [7]

    Glukhov E, Burrows LL, Deber CM. Membrane interactions of designed cationic antimicrobial peptides: the two thresholds[J]. Biopolymers, 2008, 89(5): 360-371. doi: 10.1002/bip.20917

    [8]

    Xie MF, Liu DJ, Yang YF. Anti-cancer peptides: classification, mechanism of action, reconstruction and modification[J]. Open Biol, 2020, 10(7): 200004. doi: 10.1098/rsob.200004

    [9]

    Chiangjong W, Chutipongtanate S, Hongeng S. Anticancer peptide: Physicochemical property, functional aspect and trend in clinical application (Review)[J]. Int J Oncol, 2020, 57(3): 678-696. doi: 10.3892/ijo.2020.5099

    [10]

    Muttenthaler M, King GF, Adams DJ, et al. Trends in peptide drug discovery[J]. Nat Rev Drug Discov, 2021, 20(4): 309-325. doi: 10.1038/s41573-020-00135-8

    [11]

    Tyagi A, Tuknait A, Anand P, et al. CancerPPD: a database of anticancer peptides and proteins[J]. Nucleic Acids Res, 2015, 43(Database issue): D837-D843

    [12]

    Faraji N, Arab SS, Doustmohammadi A, et al. ApInAPDB: a database of apoptosis-inducing anticancer peptides[J]. Sci Rep, 2022, 12(1): 21341. doi: 10.1038/s41598-022-25530-6

    [13]

    Wang Z, Wang GS. APD: the antimicrobial peptide database[J]. Nucleic Acids Res, 2004, 32(Database issue): D590-D592

    [14]

    Wang GS, Li X, Wang Z. APD3: the antimicrobial peptide database as a tool for research and education[J]. Nucleic Acids Res, 2016, 44(D1): D1087-D1093. doi: 10.1093/nar/gkv1278

    [15]

    Novković M, Simunić J, Bojović V, et al. DADP: the database of anuran defense peptides[J]. Bioinformatics, 2012, 28(10): 1406-1407. doi: 10.1093/bioinformatics/bts141

    [16]

    Pirtskhalava M, Amstrong AA, Grigolava M, et al. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics[J]. Nucleic Acids Res, 2021, 49(D1): D288-D297. doi: 10.1093/nar/gkaa991

    [17]

    Shi GB, Kang XY, Dong FY, et al. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides[J]. Nucleic Acids Res, 2022, 50(D1): D488-D496. doi: 10.1093/nar/gkab651

    [18]

    Zhao XW, Wu HY, Lu HR, et al. LAMP: a database linking antimicrobial peptides[J]. PLoS One, 2013, 8(6): e66557. doi: 10.1371/journal.pone.0066557

    [19]

    Kapoor P, Singh H, Gautam A, et al. TumorHoPe: a database of tumor homing peptides[J]. PLoS One, 2012, 7(4): e35187. doi: 10.1371/journal.pone.0035187

    [20]

    Jhong JH, Yao LT, Pang YX, et al. dbAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data[J]. Nucleic Acids Res, 2022, 50(D1): D460-D470. doi: 10.1093/nar/gkab1080

    [21]

    Waghu FH, Barai RS, Gurung P, et al. CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides[J]. Nucleic Acids Res, 2016, 44(D1): D1094-D1097. doi: 10.1093/nar/gkv1051

    [22]

    Piotto SP, Sessa L, Concilio S, et al. YADAMP: yet another database of antimicrobial peptides[J]. Int J Antimicrob Agents, 2012, 39(4): 346-351. doi: 10.1016/j.ijantimicag.2011.12.003

    [23]

    Singh S, Chaudhary K, Dhanda SK, et al. SATPdb: a database of structurally annotated therapeutic peptides[J]. Nucleic Acids Res, 2016, 44(D1): D1119-D1126. doi: 10.1093/nar/gkv1114

    [24]

    Wang J, Yin TL, Xiao XW, et al. StraPep: a structure database of bioactive peptides[J]. Database, 2018, 2018: bay038.

    [25]

    Das D, Jaiswal M, Khan FN, et al. PlantPepDB: a manually curated plant peptide database[J]. Sci Rep, 2020, 10(1): 2194. doi: 10.1038/s41598-020-59165-2

    [26]

    Usmani SS, Bedi G, Samuel JS, et al. THPdb: database of FDA-approved peptide and protein therapeutics[J]. PLoS One, 2017, 12(7): e0181748. doi: 10.1371/journal.pone.0181748

    [27]

    Li QL, Zhang C, Chen HJ, et al. BioPepDB: an integrated data platform for food-derived bioactive peptides[J]. Int J Food Sci Nutr, 2018, 69(8): 963-968. doi: 10.1080/09637486.2018.1446916

    [28]

    Nielsen SD, Beverly RL, Qu YY, et al. Milk bioactive peptide database: a comprehensive database of milk protein-derived bioactive peptides and novel visualization[J]. Food Chem, 2017, 232: 673-682. doi: 10.1016/j.foodchem.2017.04.056

    [29]

    Tyagi A, Kapoor P, Kumar R, et al. In silico models for designing and discovering novel anticancer peptides[J]. Sci Rep, 2013, 3: 2984. doi: 10.1038/srep02984

    [30]

    Hajisharifi Z, Piryaiee M, Mohammad Beigi M, et al. Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test[J]. J Theor Biol, 2014, 341: 34-40. doi: 10.1016/j.jtbi.2013.08.037

    [31]

    Vijayakumar S, Ptv L. ACPP: a web server for prediction and design of anti-cancer peptides[J]. Int J Pept Res Ther, 2015, 21(1): 99-106. doi: 10.1007/s10989-014-9435-7

    [32]

    Chen W, Ding H, Feng PM, et al. iACP: a sequence-based tool for identifying anticancer peptides[J]. Oncotarget, 2016, 7(13): 16895-16909. doi: 10.18632/oncotarget.7815

    [33]

    Li W, Jaroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases[J]. Bioinformatics, 2001, 17(3): 282-283. doi: 10.1093/bioinformatics/17.3.282

    [34]

    Bond-Taylor S, Leach A, Long Y, et al. Deep generative modelling: a comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models[EB/OL]. arXiv, 2021: 2103.04922. http://arxiv.org/abs/2103.04922.

    [35]

    Müller AT, Hiss JA, Schneider G. Recurrent neural network model for constructive peptide design[J]. J Chem Inf Model, 2018, 58(2): 472-479. doi: 10.1021/acs.jcim.7b00414

    [36]

    Wan FP, Kontogiorgos-Heintz D, de la Fuente-Nunez C. Deep generative models for peptide design[J]. Digit Discov, 2022, 1(3): 195-208. doi: 10.1039/D1DD00024A

    [37]

    Grisoni F, Neuhaus CS, Gabernet G, et al. Designing anticancer peptides by constructive machine learning[J]. ChemMedChem, 2018, 13(13): 1300-1302. doi: 10.1002/cmdc.201800204

    [38]

    Rossetto A, Zhou WJ. GANDALF: peptide generation for drug design using sequential and structural generative adversarial networks[C]//Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Virtual Event USA. ACM, 2020. doi: 10.1145/3388440.3412487.

    [39]

    Madani A, McCann B, Naik N, et al. ProGen: language modeling for protein generation[EB/OL].arXiv, 2020: 2004.03497. http://arxiv.org/abs/2004.03497.

    [40]

    Nijkamp E, Ruffolo JA, Weinstein EN, et al. ProGen2: exploring the boundaries of protein language models[J]. Cell Syst, 2023, 14(11): 968-978. e3.

    [41]

    Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design[J]. Nat Commun, 2022, 13(1): 4348. doi: 10.1038/s41467-022-32007-7

    [42]

    Chen B, Cheng XY, Li P, et al. xTrimoPGLM: unified 100B-scale pre-trained transformer for deciphering the language of protein[EB/OL].arXiv, 2024: 2401.06199. http://arxiv.org/abs/2401.06199.

    [43]

    Basith S, Manavalan B, Hwan Shin T, et al. Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening[J]. Med Res Rev, 2020, 40(4): 1276-1314. doi: 10.1002/med.21658

    [44]

    Liang X, Li FY, Chen JX, et al. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification[J]. Brief Bioinform, 2021, 22(4): bbaa312. doi: 10.1093/bib/bbaa312

    [45]

    Hwang JS, Kim SG, Shin TH, et al. Development of anticancer peptides using artificial intelligence and combinational therapy for cancer therapeutics[J]. Pharmaceutics, 2022, 14(5): 997. doi: 10.3390/pharmaceutics14050997

    [46]

    Agrawal P, Bhagat D, Mahalwal M, et al. AntiCP 2.0: an updated model for predicting anticancer peptides[J]. Brief Bioinform, 2021, 22(3): bbaa153. doi: 10.1093/bib/bbaa153

    [47]

    Rao B, Zhou C, Zhang GY, et al. ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides[J]. Brief Bioinform, 2020, 21(5): 1846-1855. doi: 10.1093/bib/bbz088

    [48]

    Boopathi V, Subramaniyam S, Malik A, et al. mACPpred: a support vector machine-based meta-predictor for identification of anticancer peptides[J]. Int J Mol Sci, 2019, 20(8): 1964. doi: 10.3390/ijms20081964

    [49]

    Wei LY, Zhou C, Chen HR, et al. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides[J]. Bioinformatics, 2018, 34(23): 4007-4016. doi: 10.1093/bioinformatics/bty451

    [50]

    Yi HC, You ZH, Zhou X, et al. ACP-DL: a deep learning long short-term memory model to predict anticancer peptides using high-efficiency feature representation[J]. Mol Ther Nucleic Acids, 2019, 17: 1-9.

    [51]

    He WJ, Wang Y, Cui LZ, et al. Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides[J]. Bioinformatics, 2021, 37(24): 4684-4693. doi: 10.1093/bioinformatics/btab560

    [52]

    Wang HQ, Zhao J, Zhao H, et al. CL-ACP: a parallel combination of CNN and LSTM anticancer peptide recognition model[J]. BMC Bioinformatics, 2021, 22(1): 512. doi: 10.1186/s12859-021-04433-9

    [53]

    Guo YC, Yan K, Lv HW, et al. PreTP-EL: prediction of therapeutic peptides based on ensemble learning[J]. Brief Bioinform, 2021, 22(6): bbab358. doi: 10.1093/bib/bbab358

    [54]

    Yan K, Lv HW, Wen J, et al. PreTP-stack: prediction of therapeutic peptides based on the stacked ensemble learing[J]. IEEE/ACM Trans Comput Biol Bioinform, 2023, 20(2): 1337-1344. doi: 10.1109/TCBB.2022.3183018

    [55]

    Deng H, Ding M, Wang YM, et al. ACP-MLC: a two-level prediction engine for identification of anticancer peptides and multi-label classification of their functional types[J]. Comput Biol Med, 2023, 158: 106844. doi: 10.1016/j.compbiomed.2023.106844

    [56]

    Zhong GL, Deng L. ACPScanner: prediction of anticancer peptides by integrated machine learning methodologies[J]. J Chem Inf Model, 2024, 64(3): 1092-1104. doi: 10.1021/acs.jcim.3c01860

图(3)  /  表(2)
计量
  • 文章访问数:  241
  • HTML全文浏览量:  122
  • PDF下载量:  66
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-04-01
  • 网络出版日期:  2024-06-24
  • 刊出日期:  2024-06-24

目录

    /

    返回文章
    返回
    x 关闭 永久关闭