摘要
抗菌肽(antimicrobial peptides,AMPs)是一类具有广谱抗菌活性的小分子肽,其独特抗菌机制能够有效治疗感染性疾病,且不易产生耐药性。然而,利用传统实验方式虽然能够筛选出具有抗菌活性的AMPs,但是筛选过程繁琐,人工智能筛选方法则更加快捷便利,在探索新型天然抗菌肽中展现了巨大的潜力。本文总结并比较了人工智能筛选AMPs的相关策略,包括应用于模型训练的数据来源、人工智能机器模型以及应用于模型筛选新型抗菌肽的组学数据,并对应用的前景和优势进行展望,以期为抗菌肽的鉴定识别、研发改造提供新思路。
抗生素的长期使用导致了细菌耐药性增加以及多重耐药菌株出现,因此急需找寻具有传统抗生素功效的新型治疗物质。抗菌肽(antimicrobial peptides,AMPs)是一种小分子多肽物质(小于100个氨基酸残基),存在于所有生物体中,是保护宿主免受感染的先天免疫系统的主要基石,在生物系统中发挥着多种功
筛选AMPs的传统方式是运用一系列实验技术从某一物种体内或者分泌物中进行提取鉴定,在体外针对候选肽段进行抑菌活性的筛查,最终得到高活性AMPs。这些实验技术包括物质提取、分离、纯化、质谱表征等传统的实验方
在大数据背景下,随着计算机技术的日益发展,人工智能方法筛选AMPs相比于传统方法展现出了独特的优势。例如,人工智能方法可以对AMPs的抑菌活性、抵抗蛋白酶的稳定性以及细胞毒性和溶血活性多种理化参数进行分析,对净电荷含量、肽长度、氨基酸的组成、疏水性和结构倾向等多维度参数进行预
AMPs种类丰富且易于合成,其中阳离子抗菌肽(cationic antimicrobial peptides,CAMPs),在抗菌反应中发挥着强大的效能。CAMPs序列具有多样

图1 抗菌肽(AMPs)的抗菌机制
在生物膜靶向渗透机制中,CAMPs通过静电吸引作用,吸附在带有负电荷的细菌细胞膜表面,使得细胞膜完整性受到破坏,细胞内容物释放出来,导致细菌死亡。该机制中主要有3种作用方式:环形模式、地毯模式、桶壁模
在非生物膜靶向机制中,AMPs进入细菌细胞内部后作用于不同的靶点,发挥抗菌效应。例如抑制DNA和RNA的合成、降低细胞壁结构蛋白连接所需的酶活性、抑制核糖体功能、蛋白质合成以及阻断伴侣蛋白的正确折叠、靶向线粒体抑制细胞呼吸和诱导活性氧的形成、破坏线粒体细胞膜的完整性
尽管AMPs被认为是潜力巨大的抗菌候选药物,但它们也存在一些局限性。例如,抗菌活性低、生物易降解、细胞毒性、特异性差等,这些因素限制了它们临床的应用和发
通过对现存常用AMPs数据库的综合调研,目前AMPs数量大约有3万条,其中天然来源的AMPs仅占总量的十分之一。据APD3数据库(Antimicrobial Peptide Database
随着计算机技术的开发与应用,人工智能方法可以从数百万种候选AMPs中快速筛选和鉴别潜在的具有生物活性的肽段,提高筛选AMPs的效率和准确度。
AMPs的抑菌活性、抵抗蛋白酶的稳定性以及细胞毒性和溶血活性等功能受多种理化参数的影响,包括净电荷含量、肽长度、氨基酸的组成、疏水性和结构倾向
基因组学、转录组学、蛋白质组学等多组学技术为人工智能方法发现新型抗菌肽提供了庞大的数据来源,利用这些数据可以快速准确地筛选出新型AMPs(

图2 人工智能在抗菌肽筛选领域的应用
人工智能模型筛选候选AMPs的准确性,取决于数据集的质量。不同AMPs数据库收录的侧重点以及存储数据情况都存在差异,因此对数据库来源和质量的评估对于人工智能模型的训练和筛选十分关键。
目前常用的18个AMPs数据库包括综合类型抗菌肽数据库(
数据库 | 简写 | 数据来源 | 数据量 | 参考文献 |
---|---|---|---|---|
Database of Antimicrobial Peptides | dbAMP2 | AMPs数据库、文献 | 28 709 |
[ |
Linking Antimicrobial Peptides | LAMP2 | 文献、AMPs数据库 | 23 253 |
[ |
Data Repository of Antimicrobial Peptides | DRAMP3.0 | 专利抗菌肽 | 22 259 |
[ |
A database of Structurally Annotated Therapeutic Peptides | SATPdb | 20个公共领域AMPs数据库、两个数据集 | 19 192 |
[ |
Database of Antimicrobial Activity and Structure of Peptides | DBAASPv3 | 人工合成的抗菌肽 | 18 878 |
[ |
Collection of Anti-Microbial Peptides | CAMPR3 | 专利和预测抗菌肽 | 10 247 |
[ |
Antimicrobial Peptide Database | APD3 | 天然抗菌肽 | 3 324 |
[ |
Dragon Antimicrobial Peptide Database | DAMPD | UniProt数据库 | 1 232 |
[ |
数据库 | 简写 | 数据来源 | 数据类型 | 数据量 | 参考文献 |
---|---|---|---|---|---|
Database of Anticancer Peptides & Protein | CancerPPD | 文章、专利、AMPs数据库 | 抗癌肽 | 3 491 |
[ |
Database of Experimentally Determined Hemolytic and Non-hemolytic Peptides | Hemolytik | 文献、AMPs数据库 | 溶血肽、非溶血肽 | 2 970 |
[ |
A Database of Antiviral Peptides | AVPdb | 实验验证 | 抗病毒肽 | 2 683 |
[ |
Yet another Database of Antimicrobial Peptides | YADAMP | 文献 | 抗细菌肽 | 2 525 |
[ |
Anti-tubercular Peptides Database | AntiTbPdb | 文献、专利 | 抗结核肽 | 1 010 |
[ |
Database of FDA-approved Peptide and Protein Therapeutics | THPdb | 出版物、专利、DrugBank | 治疗性肽 | 852 |
[ |
Invertebrate Antimicrobial Peptide Database | InverPep | 文献、AMPs数据库 | 无脊椎动物、抗菌肽 | 702 |
[ |
Antimicrobial Plant Peptides | PhyAMP | 文献、UniProt | 植物抗菌肽 | 273 |
[ |
Bacteriocins Database | BATIBASE | 文献、UniProt | 细菌素 | 229 |
[ |
Biofilm-active AMPs Database | BaAMPs | 文献 | 生物膜活性肽 | 221 |
[ |
已有许多机器学习算法可以识别具有抗菌功能的候选肽,包括随机森林(RF
模型 | 算法 | 训练数据 |
---|---|---|
IAMPE | NB、KNN、SVM、RF、XGBoost | CAMP, LAMP, ADAM, AntiBP(Server for antibacterial peptide prediction) |
AmPEP | RF | APD |
Target-AMP | KNN、RF、SVM | APD |
AntiBP2 | SVM | APD |
CS-AMPPred | RF、SVM | APD |
CAMPR3 | RF、SVM、DA | CAMP |
C-PAmP | RF、SVM | CAMP、 PhytAMP |
NB:朴素贝叶斯;KNN:k近邻;SVM:支持向量机;RF:随机森林;XGBoost:极致梯度提升
在各种机器学习算法中,Kavousi

图3 经典机器学习算法应用发掘AMPs
除了机器学习方法之外,深度学习的算法也应用到AMPs的筛选。例如,人工神经网络(ANN

图4 深度学习算法应用发掘AMPs
在深度学习的算法筛选AMPs的研究中,Xiao
中国科学院微生物研究所王军及陈义华课题组结合了LSTM、Attention和BERT等多个自然语言处理神经网络模型,从人类肠道微生物组数据中成功识别出候选AMP
综上所述,经典机器学习模型和深度学习算法广泛应用于发掘新型抗菌肽,人工智能技术成为筛选新型AMPs的有效工具。
随着测序数据的爆炸式增长,多组学技术产生了庞大的生物数据,其中包括基因组学、转录组学、蛋白质组学等。多组学技术产生的肽段序列可以作为人工智能筛选AMPs的数据来源。人工智能技术将依据抗菌肽序列的特征、理化性质等进行分类和鉴别,从而发掘出新型高效的AMPs(

图5 应用于模型筛选AMPs的数据来源
基因决定着AMPs的氨基酸序列,氨基酸序列又影响着多种物理参数包括净电荷含量、螺旋结构和疏水性等。因此,基因组数据是筛选出潜在的AMPs的重要来源。例如,中国科学院大学华大教育中心对草鱼胃肠道微生物群的整个宏基因组进行了测序,采用同源搜索的方式以预测AMPs,成功鉴定出了5种与先前报道的细菌毒素高度相似的AMP
虽然通过常规的基因组数据分析可以是筛选出AMPs,但是标准的同源序列比对方法精度较低,且进行功能分类的召回率也较低。而机器学习算法可以在全基因组范围进行搜索,发现编码新型AMPs的基因。例如,Fingerhut
转录组学测序技术可以全面快速的获取某一物种特定器官或组织在某一状态下的几乎所有转录本,可有效筛选表达的AMPs序列。例如,Lee
常规的转录组学分析可以鉴别出新型AMPs,但是依旧存在着候选序列过于庞大,无法全部进行体外实验验证其活性,导致出现较高的假阳性。而人工智能机器学习算法可以针对大量候选的AMPs,依据其抗菌功能重要的理化参数进行深层次的筛选和鉴定。例如,Shelenkov
蛋白质组学技术是在整体水平上研究蛋白质的特征,可以在蛋白层面筛选出有功能性的AMPs肽段。例如,华大基因海洋科学院(深圳海洋基因组学重点实验室)发表的工作,是基于APD数据库,联合使用基因组、转录组、蛋白质组、比较组学等多种组学技术,结合对AMPs肽段进行3D结构预测,从芋螺中筛选出8条具有抗真菌活性的芋螺抗菌
综上所述,多组学技术产生的数据可用于机器学习模型进一步分类和鉴别。人工智能算法针对候选AMPs的不同理化参数进行分类,最终筛选到具有高抑菌活性、高稳定性、高选择性、低细胞毒性的AMPs(

图6 人工智能技术筛选AMPs
抗菌肽具有广谱抑菌活性,且不易产生耐药性等特点,有望成为一种应用于临床的新型抗菌剂,解决抗生素耐药性危机。然而现有的抗菌肽存在红细胞溶血毒性、合成成本高等问题,亟待发掘新型抗菌肽。随着人工智能技术发展,精准、有效地筛选出高效抗菌肽成为可能。本文介绍人工智能方法在筛选抗菌肽领域中的应用,重点归纳了多种机器学习模型和深度学习算法在筛选新型抗菌肽中的应用,并且比较了不同预测模型的优缺点以及各自的特性,为该领域研究者提供见解和帮助。
目前对于已发现的抗菌肽的抗菌机制与抗菌肽的结构之间存在的关联性研究还不够深入,因此需要将计算机模型预测的结果与实验相结合。即将人工智能模型识别鉴定出的候选肽段与体外活性检测相结合,以期最终获得高效、优良的抗菌肽。另外,基于人工智能预测抗菌肽的方法还需要进一步研究,如开发更有针对性的算法对多组学产生的数据进行选取、整合、处理,为高效率筛选抗菌肽提供更多新途径。

References
Wang GS. The antimicrobial peptide database provides a platform for decoding the design principles of naturally occurring antimicrobial peptides[J]. Protein Sci, 2020, 29(1): 8-18. [百度学术]
Chaparro E, da Silva Junior PI. Lacrain: the first antimicrobial peptide from the body extract of the Brazilian centipede Scolopendra viridicornis[J]. Int J Antimicrob Agents, 2016, 48(3): 277-285. [百度学术]
Jiao K, Gao J, Zhou T, et al. Isolation and purification of a novel antimicrobial peptide from Porphyra yezoensis[J]. J Food Biochem, 2019, 43(7): e12864. [百度学术]
Wang JJ, Dou XJ, Song J, et al. Antimicrobial peptides: promising alternatives in the post feeding antibiotic era[J]. Med Res Rev, 2019, 39(3): 831-859. [百度学术]
Passarini I, Rossiter S, Malkinson J, et al. In silico structural evaluation of short cationic antimicrobial peptides[J]. Pharmaceutics, 2018, 10(3): 72. [百度学术]
Alsaggar M, Al-Hazabreh M, Al Tall Y, et al. HAZ, a novel peptide with broad-spectrum antibacterial activity[J]. Saudi Pharm J, 2022, 30(11): 1652-1658. [百度学术]
Chen N, Jiang C. Antimicrobial peptides: structure, mechanism, and modification[J]. Eur J Med Chem, 2023, 255: 115377. [百度学术]
da Costa JP, Cova M, Ferreira R, et al. Antimicrobial peptides: an alternative for innovative medicines[J]? Appl Microbiol Biotechnol, 2015, 99(5): 2023-2040. [百度学术]
Czaplewski L, Bax R, Clokie M, et al. Alternatives to antibiotics-a pipeline portfolio review[J]. Lancet Infect Dis, 2016, 16(2): 239-251. [百度学术]
Liu SC, Fan LL, Sun J, et al. Computational resources and tools for antimicrobial peptides[J]. J Pept Sci, 2017, 23(1): 4-12. [百度学术]
Chen CH, Lu TK. Development and challenges of antimicrobial peptides for therapeutic applications[J]. Antibiotics, 2020, 9(1): 24. [百度学术]
Wang GS, Li X, Wang Z. APD3: the antimicrobial peptide database as a tool for research and education[J]. Nucleic Acids Res, 2016, 44(D1): D1087-D1093. [百度学术]
Partners CN MA. Database resources of the national genomics data center, China national center for bioinformation in 2022[J]. Nucleic Acids Res, 2022, 50(D1): D27-D38. [百度学术]
Jhong JH, Yao LT, Pang YX, et al. dbAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data[J]. Nucleic Acids Res, 2022, 50(D1): D460-D470. [百度学术]
Ye GZ, Wu HY, Huang JJ, et al. LAMP2: a major update of the database linking antimicrobial peptides[J]. Database, 2020, 2020: baaa061. [百度学术]
Shi GB, Kang XY, Dong FY, et al. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides[J]. Nucleic Acids Res, 2022, 50(D1): D488-D496. [百度学术]
Singh S, Chaudhary K, Dhanda SK, et al. SATPdb: a database of structurally annotated therapeutic peptides[J]. Nucleic Acids Res, 2016, 44(D1): D1119-D1126. [百度学术]
Pirtskhalava M, Amstrong AA, Grigolava M, et al. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics[J]. Nucleic Acids Res, 2021, 49(D1): D288-D297. [百度学术]
Waghu FH, Barai RS, Gurung P, et al. CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides[J]. Nucleic Acids Res, 2016, 44(D1): D1094-D1097. [百度学术]
Seshadri Sundararajan V, Gabere MN, Pretorius A, et al. DAMPD: a manually curated antimicrobial peptide database[J]. Nucleic Acids Res, 2012, 40(Database issue): D1108-D1112. [百度学术]
Tyagi A, Tuknait A, Anand P, et al. CancerPPD: a database of anticancer peptides and proteins[J]. Nucleic Acids Res, 2015, 43(Database issue): D837-D843. [百度学术]
Gautam A, Chaudhary K, Singh S, et al. Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides[J]. Nucleic Acids Res, 2014, 42(Database issue): D444-D449. [百度学术]
Qureshi A, Thakur N, Tandon H, et al. AVPdb: a database of experimentally validated antiviral peptides targeting medically important viruses[J]. Nucleic Acids Res, 2014, 42(Database issue): D1147-D1153. [百度学术]
Piotto SP, Sessa L, Concilio S, et al. YADAMP: yet another database of antimicrobial peptides[J]. Int J Antimicrob Agents, 2012, 39(4): 346-351. [百度学术]
Usmani SS, Kumar R, Kumar V, et al. AntiTbPdb: a knowledgebase of anti-tubercular peptides[J]. Database, 2018, 2018: bay025. [百度学术]
Usmani SS, Bedi G, Samuel JS, et al. THPdb: database of FDA-approved peptide and protein therapeutics[J]. PLoS One, 2017, 12(7): e0181748. [百度学术]
Gómez EA, Giraldo P, Orduz S. InverPep: a database of invertebrate antimicrobial peptides[J]. J Glob Antimicrob Resist, 2017, 8: 13-17. [百度学术]
Hammami R, Ben Hamida J, Vergoten G, et al. PhytAMP: a database dedicated to antimicrobial plant peptides[J]. Nucleic Acids Res, 2009, 37(Database issue): D963-D968. [百度学术]
Hammami R, Zouhir A, Ben Hamida J, et al. BACTIBASE: a new web-accessible database for bacteriocin characterization[J]. BMC Microbiol, 2007, 7: 89. [百度学术]
Di Luca M, Maccari G, Maisetta G, et al. BaAMPs: the database of biofilm-active antimicrobial peptides[J]. Biofouling, 2015, 31(2): 193-199. [百度学术]
Khabbaz H, Karimi-Jafari MH, Saboury AA, et al. Prediction of antimicrobial peptides toxicity based on their physico-chemical properties using machine learning techniques[J]. BMC Bioinformatics, 2021, 22(1): 549. [百度学术]
Kavousi K, Bagheri M, Behrouzi S, et al. IAMPE: NMR-assisted computational prediction of antimicrobial peptides[J]. J Chem Inf Model, 2020, 60(10): 4691-4701. [百度学术]
Xiao X, Wang P, Lin WZ, et al. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types[J]. Anal Biochem, 2013, 436(2): 168-177. [百度学术]
Müller KR, Mika S, Rätsch G, et al. An introduction to kernel-based learning algorithms[J]. IEEE Trans Neural Netw, 2001, 12(2): 181-201. [百度学术]
Jaiswal M, Singh A, Kumar S. PTPAMP: prediction tool for plant-derived antimicrobial peptides[J]. Amino Acids, 2023, 55(1): 1-17. [百度学术]
Lira F, Perez PS, Baranauskas JA, et al. Prediction of antimicrobial activity of synthetic peptides by a decision tree model[J]. Appl Environ Microbiol, 2013, 79(10): 3156-3159. [百度学术]
Lv HW, Yan K, Guo YC, et al. AMPpred-EL: an effective antimicrobial peptide prediction model based on ensemble learning[J]. Comput Biol Med, 2022, 146: 105577. [百度学术]
Exarchos KP, Exarchos TP, Papaloukas C, et al. Predicting peptide bond conformation using feature selection and the Naïve Bayes approach[J]. Annu Int Conf IEEE Eng Med Biol Soc, 2007, 2007: 5009-5012. [百度学术]
Chen W, Luo LF. Classification of antimicrobial peptide using diversity measure with quadratic discriminant analysis[J]. J Microbiol Methods, 2009, 78(1): 94-96. [百度学术]
Fjell CD, Hancock RE, Cherkasov A. AMPer: a database and an automated discovery tool for antimicrobial peptides[J]. Bioinformatics, 2007, 23(9): 1148-1155. [百度学术]
de Jong A, van Heel AJ, Kok J, et al. BAGEL2: mining for bacteriocins in genomic data[J]. Nucleic Acids Res, 2010, 38(Web Server issue): W647-W651. [百度学术]
Polanco C, Samaniego JL. Detection of selective cationic amphipatic antibacterial peptides by Hidden Markov models[J]. Acta Biochim Pol, 2009, 56(1): 167-176. [百度学术]
Guo YC, Yan K, Lv HW, et al. PreTP-EL: prediction of therapeutic peptides based on ensemble learning[J]. Brief Bioinform, 2021, 22(6): bbab358. [百度学术]
Bhadra P, Yan JL, Li JY, et al. AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest[J]. Sci Rep, 2018, 8(1): 1697. [百度学术]
Jan A, Hayat M, Wedyan M, et al. Target-AMP: computational prediction of antimicrobial peptides by coupling sequential information with evolutionary profile[J]. Comput Biol Med, 2022, 151(Pt A): 106311. [百度学术]
Lata S, Mishra NK, Raghava GP. AntiBP2: improved version of antibacterial peptide prediction[J]. BMC Bioinformatics, 2010, 11(Suppl 1): S19. [百度学术]
Porto WF, Pires ÁS, Franco OL. CS-AMPPred: an updated SVM model for antimicrobial activity prediction in cysteine-stabilized peptides[J]. PLoS One, 2012, 7(12): e51444. [百度学术]
Niarchou A, Alexandridou A, Athanasiadis E, et al. C-PAmP: large scale analysis and database construction containing high scoring computationally predicted antimicrobial peptides for all the available plant species[J]. PLoS One, 2013, 8(11): e79728. [百度学术]
Rajkumar M, Bhukya SN, Ahalya N, et al. Impact of ANN in revealing of viral peptides[J]. Biomed Res Int, 2022, 2022: 7760734. [百度学术]
Zhang HP, Saravanan KM, Wei YJ, et al. Deep learning-based bioactive therapeutic peptide generation and screening[J]. J Chem Inf Model, 2023, 63(3): 835-845. [百度学术]
Wang HQ, Zhao J, Zhao H, et al. CL-ACP: a parallel combination of CNN and LSTM anticancer peptide recognition model[J]. BMC Bioinformatics, 2021, 22(1): 512. [百度学术]
Xiao X, Shao YT, Cheng X, et al. iAMP-CA2L: a new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types[J]. Brief Bioinform, 2021, 22(6): bbab209. [百度学术]
Ma Y, Guo ZY, Xia BB, et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning[J]. Nat Biotechnol, 2022, 40(6): 921-931. [百度学术]
Wang C, Garlick S, Zloh M. Deep learning for novel antimicrobial peptide design[J]. Biomolecules, 2021, 11(3): 471. [百度学术]
Dong B, Yi YH, Liang LF, et al. High throughput identification of antimicrobial peptides from fish gastrointestinal microbiota[J]. Toxins, 2017, 9(9): 266. [百度学术]
Fingerhut LCHW, Miller DJ, Strugnell JM, et al. Ampir: an R package for fast genome-wide prediction of antimicrobial peptides[J]. Bioinformatics, 2021, 36(21): 5262-5263. [百度学术]
Sharma R, Shrivastava S, Kumar Singh S, et al. AniAMPpred: artificial intelligence guided discovery of novel antimicrobial peptides in animal kingdom[J]. Brief Bioinform, 2021, 22(6): bbab242. [百度学术]
Lee JH, Chung H, Shin YP, et al. Deciphering novel antimicrobial peptides from the transcriptome of Papilio xuthus[J]. Insects, 2020, 11(11): 776. [百度学术]
Shelenkov AA, Slavokhotova AA, Odintsova TI. Cysmotif searcher pipeline for antimicrobial peptide identification in plant transcriptomes[J]. Biochemistry, 2018, 83(11): 1424-1432. [百度学术]
Grafskaia EN, Polina NF, Babenko VV, et al. Discovery of novel antimicrobial peptides: a transcriptomic study of the sea Anemone Cnidopus japonicus[J]. J Bioinform Comput Biol, 2018, 16(2): 1840006. [百度学术]
Li RH, Huang Y, Peng C, et al. High-throughput prediction and characterization of antimicrobial peptides from multi-omics datasets of Chinese tubular cone snail (Conus betulinus)[J]. Front Mar Sci, 2022, 9: 1092731. [百度学术]
Ebou A, Koua D, Addablah A, et al. Combined proteotranscriptomic-based strategy to discover novel antimicrobial peptides from cone snails[J]. Biomedicines, 2021, 9(4): 344. [百度学术]