Citation: | HU Zi’ang, GAO Liming, YU Wenying. Advances in the application of artificial intelligence in nucleic acid drug development[J]. J China Pharm Univ, 2024, 55(3): 335 − 346. DOI: 10.11665/j.issn.1000-5048.2024033101 |
In recent years, the field of nucleic acid therapeutics has been flourishing, progressively establishing itself as the third generation of drug modalities following small molecules and antibody-based drugs. Artificial intelligence technology based on machine learning is advancing rapidly, which can significantly accelerate the development process of nucleic acid therapeutics. This review provides an overview of the foundational aspects of artificial intelligence algorithms, databases, and characterizations in the field of nucleic acid drug development. It elucidates the advances in the application of artificial intelligence in nucleic acid structural prediction, small nucleic acid drug design, and other research and development phases of nucleic acid therapeutics, aiming to offer some reference for the interdisciplinary development of artificial intelligence and nucleic acid drugs.
[1] |
DeWeerdt S. RNA therapies explained[J]. Nature, 2019, 574(7778): S2-S3.
|
[2] |
Cochrane G, Karsch-Mizrachi I, Takagi T, et al. The international nucleotide sequence database collaboration[J]. Nucleic Acids Res, 2016, 44(D1): D48-D50.
|
[3] |
Zardecki C, Duarte JM, Bi C, et al. RCSB PDB next-generation data delivery and search services[J]. Acta Crystallogr A, 2020, A 76 : a70.
|
[4] |
Romero PR, Kobayashi N, Wedell JR, et al. BioMagResBank (BMRB) as a resource for structural biology[J]. Methods Mol Biol, 2020, 2112: 187-218.
|
[5] |
Rigden DJ, Fernández XM. The 2023 Nucleic Acids Research Database Issue and the online molecular biology database collection[J]. Nucleic Acids Res, 2023, 51(D1): D1-D8.
|
[6] |
Benson DA, Cavanaugh M, Clark K, et al. GenBank[J]. Nucleic Acids Res, 2018, 46(D1): D41-D47.
|
[7] |
Yuan D, Ahamed A, Burgin J, et al. The European nucleotide archive in 2023[J]. Nucleic Acids Res, 2024, 52(D1): D92-D97.
|
[8] |
Ara T, Kodama Y, Tokimatsu T, et al. DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata[J]. Nucleic Acids Res, 2024, 52(D1): D67-D71.
|
[9] |
Haft DH, Badretdin A, Coulouris G, et al. RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes[J]. Nucleic Acids Res, 2024, 52(D1): D762-D769.
|
[10] |
Kozomara A, Birgaoanu M, Griffiths-Jones S. miRBase: from microRNA sequences to function[J]. Nucleic Acids Res, 2019, 47(D1): D155-D162.
|
[11] |
The RNAcentral Consortium. RNAcentral: a hub of information for non-coding RNA sequences[J]. Nucleic Acids Res, 2019, 47(D1): D221-D229.
|
[12] |
Coimbatore Narayanan B, Westbrook J, Ghosh S, et al. The Nucleic Acid Database: new features and capabilities[J]. Nucleic Acids Res, 2014, 42(D1): D114-D122.
|
[13] |
Berman HM, Lawson CL, Schneider B. Developing community resources for nucleic acid structures[J]. Life, 2022, 12(4): 540.
|
[14] |
Zanegina O, Kirsanov D, Baulin E, et al. An updated version of NPIDB includes new classifications of DNA-protein complexes and their families[J]. Nucleic Acids Res, 2016, 44(D1): D144-D153.
|
[15] |
Norambuena T, Melo F. The Protein-DNA interface database[J]. BMC Bioinformatics, 2010, 11: 262.
|
[16] |
Sagendorf JM, Markarian N, Berman HM, et al. DNAproDB: an expanded database and web-based tool for structural analysis of DNA-protein complexes[J]. Nucleic Acids Res, 2020, 48(D1): D277-D287.
|
[17] |
Lewis BA, Walia RR, Terribilini M, et al. PRIDB: a Protein-RNA interface database[J]. Nucleic Acids Res, 2011, 39(D1): D277-D282.
|
[18] |
Alipanahi B, Delong A, Weirauch MT, et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning[J]. Nat Biotechnol, 2015, 33(8): 831-838.
|
[19] |
Pan XY, Rijnbeek P, Yan JC, et al. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks[J]. BMC Genomics, 2018, 19(1): 511.
|
[20] |
Sieber P, Platzer M, Schuster S. The definition of open reading frame revisited[J]. Trends Genet, 2018, 34(3): 167-170.
|
[21] |
Kirk JM, Kim SO, Inoue K, et al. Functional classification of long non-coding RNAs by k-mer content[J]. Nat Genet, 2018, 50(10): 1474-1482.
|
[22] |
Liu YC, Guo JT, Hu GQ, et al. Gene prediction in metagenomic fragments based on the SVM algorithm[J]. BMC Bioinformatics, 2013, 14(Suppl 5): S12.
|
[23] |
Meher PK, Sahu TK, Gahoi S, et al. Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition[J]. Gene, 2019, 705: 113-126.
|
[24] |
Yang S, Wang Y, Zhang SQ, et al. NCResNet: noncoding ribonucleic acid prediction based on a deep resident network of ribonucleic acid sequences[J]. Front Genet, 2020, 11: 90.
|
[25] |
Cock PJ, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics[J]. Bioinformatics, 2009, 25(11): 1422-1423.
|
[26] |
Song JM, Tian SW, Yu L, et al. MD-MLI: prediction of miRNA-lncRNA interaction by using multiple features and hierarchical deep learning[J]. IEEE/ACM Trans Comput Biol Bioinform, 2022, 19(3): 1724-1733.
|
[27] |
Danaee P, Rouches M, Wiley M, et al. bpRNA: large-scale automated annotation and analysis of RNA secondary structure[J]. Nucleic Acids Res, 2018, 46(11): 5381-5394.
|
[28] |
Blumenthal DM, Singal G, Mangla SS, et al. Predicting non-adherence with outpatient colonoscopy using a novel electronic tool that measures prior non-adherence[J]. J Gen Intern Med, 2015, 30(6): 724-731.
|
[29] |
Han SY, Liang YC, Ma Q, et al. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property[J]. Brief Bioinform, 2019, 20(6): 2009-2027.
|
[30] |
Fu LY, Cao YX, Wu J, et al. UFold: fast and accurate RNA secondary structure prediction with deep learning[J]. Nucleic Acids Res, 2022, 50(3): e14.
|
[31] |
Vicens Q, Kieft JS. Thoughts on how to think (and talk) about RNA structure[J]. Proc Natl Acad Sci U S A, 2022, 119(17): e2112677119.
|
[32] |
Lin LN, Sheng J, Huang Z. Nucleic acid X-ray crystallography via direct selenium derivatization[J]. Chem Soc Rev, 2011, 40(9): 4591-4602.
|
[33] |
Zuker M. On finding all suboptimal foldings of an RNA molecule[J]. Science, 1989, 244(4900): 48-52.
|
[34] |
Nussinov R, Jacobson AB. Fast algorithm for predicting the secondary structure of single-stranded RNA[J]. Proc Natl Acad Sci U S A, 1980, 77(11): 6309-6313.
|
[35] |
McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure[J]. Biopolymers, 1990, 29(6/7): 1105-1119.
|
[36] |
Gardner PP, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches[J]. BMC Bioinformatics, 2004, 5: 140.
|
[37] |
Aigner K, Dreßen F, Steger G. Methods for predicting RNA secondary structure[M]//Leontis N, Westhof E. RNA 3D Structure Analysis and Prediction. Berlin, Heidelberg: Springer, 2012: 19-41.
|
[38] |
Sankoff D. Simultaneous solution of the RNA folding, alignment and protosequence problems[J]. SIAM J Appl Math, 1985, 45(5): 810-825.
|
[39] |
Šponer J, Bussi G, Krepl M, et al. RNA structural dynamics As captured by molecular simulations: a comprehensive overview[J]. Chem Rev, 2018, 118(8): 4177-4338.
|
[40] |
Martinez HM, Maizel JV Jr, Shapiro BA. RNA2D3D: a program for generating, viewing, and comparing 3-dimensional models of RNA[J]. J Biomol Struct Dyn, 2008, 25(6): 669-683.
|
[41] |
Zhao Q, Zhao Z, Fan XY, et al. Review of machine learning methods for RNA secondary structure prediction[J]. PLoS Comput Biol, 2021, 17(8): e1009291.
|
[42] |
Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects[J]. Science, 2015, 349(6245): 255-260.
|
[43] |
Xia T, SantaLucia J Jr, Burkard ME, et al. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs[J]. Biochemistry, 1998, 37(42): 14719-14735.
|
[44] |
Andronescu M, Condon A, Hoos HH, et al. Efficient parameter estimation for RNA secondary structure prediction[J]. Bioinformatics, 2007, 23(13): i19-i28.
|
[45] |
Andronescu M, Condon A, Hoos HH, et al. Computational approaches for RNA energy parameter estimation[J]. RNA, 2010, 16(12): 2304-2318.
|
[46] |
Rehmsmeier M, Steffen P, Hochsmann M, et al. Fast and effective prediction of microRNA/target duplexes[J]. RNA, 2004, 10(10): 1507-1517.
|
[47] |
Tang XY, Thomas S, Tapia L, et al. Simulating RNA folding kinetics on approximated energy landscapes[J]. J Mol Biol, 2008, 381(4): 1055-1067.
|
[48] |
Hor CY, Yang CB, Chang CH, et al. A tool preference choice method for RNA secondary structure prediction by SVM with statistical tests[J]. Evol Bioinform Online, 2013, 9: 163-184.
|
[49] |
Zhu Y, Xie ZY, Li YZ, et al. Research on folding diversity in statistical learning methods for RNA secondary structure prediction[J]. Int J Biol Sci, 2018, 14(8): 872-882.
|
[50] |
Andrews D, Guggenberger P. Asymptotics for stationary very nearly unit root processes[J]. J Time Ser Anal, 2008, 29(1): 203-212.
|
[51] |
Singh J, Hanson J, Paliwal K, et al. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning[J]. Nat Commun, 2019, 10(1): 5407.
|
[52] |
Singh J, Paliwal K, Zhang TC, et al. Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning[J]. Bioinformatics, 2021, 37(17): 2589-2600.
|
[53] |
Calonaci N, Jones A, Cuturello F, et al. Machine learning a model for RNA structure prediction[J]. NAR Genom Bioinform, 2020, 2(4): lqaa090.
|
[54] |
Willmott D, Murrugarra D, Ye Q. Improving RNA secondary structure prediction via state inference with deep recurrent neural networks[EB/OL]. arXiv, 2019: 1906.10819. http://arxiv.org/abs/1906.10819.
|
[55] |
Townshend RJL, Eismann S, Watkins AM, et al. Geometric deep learning of RNA structure[J]. Science, 2021, 373(6558): 1047-1051.
|
[56] |
Wang J, Wang J, Huang YZ, et al. 3dRNA v2.0: an updated web server for RNA 3D structure prediction[J]. Int J Mol Sci, 2019, 20(17): 4116.
|
[57] |
Wang WK, Feng CJ, Han RM, et al. trRosettaRNA: automated prediction of RNA 3D structure with transformer network[J]. Nat Commun, 2023, 14(1): 7266.
|
[58] |
Esmaeeli R, Bauzá A, Perez A. Structural predictions of protein-DNA binding: MELD-DNA[J]. Nucleic Acids Res, 2023, 51(4): 1625-1636.
|
[59] |
Baek M, McHugh R, Anishchenko I, et al. Accurate prediction of protein-nucleic acid complexes using RoseTTAFoldNA[J]. Nat Methods, 2024, 21(1): 117-121.
|
[60] |
Chaudhary N, Weissman D, Whitehead KA. mRNA vaccines for infectious diseases: principles, delivery and clinical translation[J]. Nat Rev Drug Discov, 2021, 20(11): 817-838.
|
[61] |
Churkin A, Retwitzer MD, Reinharz V, et al. Design of RNAs: comparing programs for inverse RNA folding[J]. Brief Bioinform, 2018, 19(2): 350-358.
|
[62] |
Andronescu M, Fejes AP, Hutter F, et al. A new algorithm for RNA secondary structure design[J]. J Mol Biol, 2004, 336(3): 607-624.
|
[63] |
Hampson DJD, Tsang HH. Incorporating dynamic exploration strategy for RNA design[C]//2018 IEEE Symposium Series on Computational Intelligence (SSCI). Bangalore, India. IEEE, 2018: 1041-1048.
|
[64] |
Busch A, Backofen R. INFO-RNA: a fast approach to inverse RNA folding[J]. Bioinformatics, 2006, 22(15): 1823-1831.
|
[65] |
Taneda A. MODENA: a multi-objective RNA inverse folding[J]. Adv Appl Bioinform Chem, 2011, 4: 1-12.
|
[66] |
Zhang H, Zhang L, Lin A, et al. Algorithm for optimized mRNA design improves stability and immunogenicity[J]. Nature, 2023, 621(7978): 396-403.
|
[67] |
Zhao HJ, Shao XY, Yu YT, et al. A therapeutic hepatitis B mRNA vaccine with strong immunogenicity and persistent virological suppression[J]. NPJ Vaccines, 2024, 9(1): 22.
|
[68] |
Sav S, Hampson DJD, Tsang HH. SIMARD: a simulated annealing based RNA design algorithm with quality pre-selection strategies[C]//2016 IEEE Symposium Series on Computational Intelligence (SSCI). Athens, Greece. IEEE, 2016: 1-8.
|
[69] |
Esmaili-Taheri A, Ganjtabesh M. ERD: a fast and reliable tool for RNA design including constraints[J]. BMC Bioinformatics, 2015, 16: 20.
|
[70] |
Wiese K, Deschenes A, Hendriks A. RnaPredict: an evolutionary algorithm for RNA secondary structure prediction[J]. IEEE/ACM Trans Comput Biol Bioinform, 2008, 5(1): 25-41.
|
[71] |
McBride R, Tsang HH. Examination of annealing schedules for RNA design[C]//2020 IEEE Congress on Evolutionary Computation (CEC). Glasgow, UK. IEEE, 2020: 1-8.
|
[72] |
Minuesa G, Alsina C, Garcia-Martin JA, et al. MoiRNAiFold: a novel tool for complex in silico RNA design[J]. Nucleic Acids Res, 2021, 49(9): 4934-4943.
|
[73] |
Rubio-Largo Á, Vanneschi L, Castelli M, et al. Multiobjective metaheuristic to design RNA sequences[J]. IEEE Trans Evol Comput, 2019, 23(1): 156-169.
|
[74] |
Chiba S, Lim KRQ, Sheri N, et al. eSkip-Finder: a machine learning-based web application and database to identify the optimal sequences of antisense oligonucleotides for exon skipping[J]. Nucleic Acids Res, 2021, 49(W1): W193-W198.
|
[75] |
Han Y, He F, Tan X, et al. Effective small interfering RNA design based on convolutional neural network[C]//2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Kansas City, MO. IEEE, 2017: 16-21.
|
[76] |
Chuai GH, Ma HH, Yan JF, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning[J]. Genome Biol, 2018, 19(1): 80.
|
[77] |
Tasdelen A, Sen BH. A hybrid CNN-LSTM model for pre-miRNA classification[J]. Sci Rep, 2021, 11(1): 14125.
|
[78] |
Im J, Park B, Han K. A generative model for constructing nucleic acid sequences binding to a protein[J]. BMC Genomics, 2019, 20(Suppl 13): 967.
|
[79] |
Runge F, Stoll D, Falkner S, et al. Learning to design RNA[EB/OL]. arXiv, 2018: 1812.11951. http://arxiv.org/abs/1812.11951
|
[80] |
Eastman P, Shi J, Ramsundar B, et al. Solving the RNA design problem with reinforcement learning[J]. PLoS Comput Biol, 2018, 14(6): e1006176.
|
[81] |
Iwano N, Adachi T, Aoki K, et al. Generative aptamer discovery using RaptGen[J]. Nat Comput Sci, 2022, 2(6): 378-386.
|
[82] |
Sumi S, Hamada M, Saito H. Deep generative design of RNA family sequences[J]. Nat Methods, 2024, 21(3): 435-443.
|
[83] |
Gupta A, Zou J. Feedback GAN for DNA optimizes protein functions[J]. Nat Mach Intell, 2019, 1: 105-111.
|
[84] |
Linder J, Seelig G. Fast activation maximization for molecular sequence design[J]. BMC Bioinformatics, 2021, 22(1): 510.
|
[85] |
Sato K, Akiyama M, Sakakibara Y. RNA secondary structure prediction using deep learning with thermodynamic integration[J]. Nat Commun, 2021, 12(1): 941.
|
[86] |
Rivas E. The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective[J]. RNA Biol, 2013, 10(7): 1185-1196.
|
[87] |
Chen X, Liu P, Chou HH. Whole-genome thermodynamic analysis reduces siRNA off-target effects[J]. PLoS One, 2013, 8(3): e58326.
|
[88] |
Alkan F, Wenzel A, Palasca O, et al. RIsearch2: suffix array-based large-scale prediction of RNA-RNA interactions and siRNA off-targets[J]. Nucleic Acids Res, 2017, 45(8): e60.
|
[89] |
Iyer S, Deutsch K, Yan XW, et al. Batch RNAi selector: a standalone program to predict specific siRNA candidates in batches with enhanced sensitivity[J]. Comput Methods Programs Biomed, 2007, 85(3): 203-209.
|
[90] |
Xu Y, Ma SH, Cui HT, et al. AGILE platform: a deep learning-powered approach to accelerate LNP development for mRNA delivery[EB/OL]. bioRxiv, 2023. doi: 10.1101/2023.06.01.543345.
|
[1] | ZHANG Zhixing, DENG Hua, TANG Yun. Applications and challenges of artificial intelligence in the development of anticancer peptides[J]. Journal of China Pharmaceutical University, 2024, 55(3): 347-356. DOI: 10.11665/j.issn.1000-5048.2024040201 |
[2] | HU Zi’ang, GAO Liming, YU Wenying. Advances in the application of artificial intelligence in nucleic acid drug development[J]. Journal of China Pharmaceutical University, 2024, 55(3): 335-346. DOI: 10.11665/j.issn.1000-5048.2024033101 |
[3] | ZENG Hao, WU Guozhen, ZOU Wuxin, WANG Zhe, SONG Jianfei, SHI Hui, WANG Xiaojian, HOU Tingjun, DENG Yafeng. Optimization of Menin inhibitors based on artificial intelligence-driven molecular factory technology[J]. Journal of China Pharmaceutical University, 2024, 55(3): 326-334. DOI: 10.11665/j.issn.1000-5048.2024040904 |
[4] | CHEN Baiyu, LYU Lunan, XU Xiaodi, ZHANG Ying, LI Wei, FU Wei. Reflections on improving drug success rates with AIDD and CADD[J]. Journal of China Pharmaceutical University, 2024, 55(3): 284-294. DOI: 10.11665/j.issn.1000-5048.2024011302 |
[5] | PU Chengtao, GU Lingqian, CHEN Xingye, ZHANG Yanmin. Prediction of human intestinal absorption properties based on artificial intelligence[J]. Journal of China Pharmaceutical University, 2023, 54(3): 355-362. DOI: 10.11665/j.issn.1000-5048.2023032102 |
[6] | XUE Feng, FENG Shuo, LI Jing. Application and prospect of artificial intelligence in antimicrobial peptides screening[J]. Journal of China Pharmaceutical University, 2023, 54(3): 314-322. DOI: 10.11665/j.issn.1000-5048.2023030901 |
[7] | GU Zhihao, GUO Wenhao, YAO Hequan, LI Xuanyi, LIN Kejiang. Research progress of the screening and generation of lead compounds based on artificial intelligence model[J]. Journal of China Pharmaceutical University, 2023, 54(3): 294-304. DOI: 10.11665/j.issn.1000-5048.2023042201 |
[8] | YU Zehao, ZHANG Leiming, ZHANG Mengna, DAI Zhiqi, PENG Chengbin, ZHENG Siming. Artificial intelligence-based drug development: current progress and future challenges[J]. Journal of China Pharmaceutical University, 2023, 54(3): 282-293. DOI: 10.11665/j.issn.1000-5048.2023041003 |
[9] | WANG Chao, XIAO Fu, LI Miaozhu, PAN Ying, DING Xiao, REN Feng, ZHAVORONKOV Alex, WANG Yazhou. Application progress of artificial intelligence in the screening and identification of drug targets[J]. Journal of China Pharmaceutical University, 2023, 54(3): 269-281. DOI: 10.11665/j.issn.1000-5048.2023041102 |
[10] | YAN Fangrong. Application and advance of artificial intelligence in biomedical field[J]. Journal of China Pharmaceutical University, 2023, 54(3): 263-268. DOI: 10.11665/j.issn.1000-5048.2023030304 |