Citation: | TANG qian, CHEN Roufen, SHEN Zheyuan, et al. Research progress of artificial intelligence-based small molecule generation models in drug discovery[J]. J China Pharm Univ, 2024, 55(3): 295 − 305. DOI: 10.11665/j.issn.1000-5048.2024031501 |
With the rapid development of artificial intelligence technology, small molecule generation models have emerged as a significant research direction in the field of drug discovery. These models, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models, have proven to possess remarkable capabilities in optimizing drug properties and generating complex molecular structures. This article comprehensively analyzes the application of the aforementioned advanced technologies in the drug discovery process, demonstrating how they supplement and enhance traditional drug design methods. At the same time, it addresses the challenges facing current methods in terms of data quality, model complexity, computational cost, and generalization ability, with a prospect of future research directions.
[1] |
Sabe VT, Ntombela T, Jhamba LA, et al. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: a review[J]. Eur J Med Chem, 2021, 224: 113705. doi: 10.1016/j.ejmech.2021.113705
|
[2] |
Macarron R, Banks MN, Bojanic D, et al. Impact of high-throughput screening in biomedical research[J]. Nat Rev Drug Discov, 2011, 10(3): 188-195. doi: 10.1038/nrd3368
|
[3] |
Zeng XX, Wang F, Luo Y, et al. Deep generative molecular design reshapes drug discovery[J]. Cell Rep Med, 2022, 3(12): 100794. doi: 10.1016/j.xcrm.2022.100794
|
[4] |
Bilodeau C, Jin WG, Jaakkola T, et al. Generative models for molecular discovery: recent advances and challenges[J]. WIREs Comput Mol Sci, 2022, 12(5): e1608. doi: 10.1002/wcms.1608
|
[5] |
Thomas M, Smith RT, O’Boyle NM, et al. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study[J]. J Cheminform, 2021, 13(1): 39. doi: 10.1186/s13321-021-00516-0
|
[6] |
Wigh DS, Goodman JM, Lapkin AA. A review of molecular representation in the age of machine learning[J]. WIREs Comput Mol Sci, 2022, 12(5): e1603. doi: 10.1002/wcms.1603
|
[7] |
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, et al. A review on machine learning approaches and trends in drug discovery[J]. Comput Struct Biotechnol J, 2021, 19: 4538-4558. doi: 10.1016/j.csbj.2021.08.011
|
[8] |
Cereto-Massagué A, Ojeda MJ, Valls C, et al. Molecular fingerprint similarity search in virtual screening[J]. Methods, 2015, 71: 58-63. doi: 10.1016/j.ymeth.2014.08.005
|
[9] |
David L, Thakkar A, Mercado R, et al. Molecular representations in AI-driven drug discovery: a review and practical guide[J]. J Cheminform, 2020, 12(1): 56. doi: 10.1186/s13321-020-00460-5
|
[10] |
Coley CW, Barzilay R, Green WH, et al. Convolutional embedding of attributed molecular graphs for physical property prediction[J]. J Chem Inf Model, 2017, 57(8): 1757-1772. doi: 10.1021/acs.jcim.6b00601
|
[11] |
Igashov I, Pavlichenko N, Grudinin S. Spherical convolutions on molecular graphs for protein model quality assessment[J]. Mach Learn: Sci Technol, 2021, 2(4): 045005. doi: 10.1088/2632-2153/abf856
|
[12] |
Zhang Y, Huang W, Wei Z, et al. EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction[EB/OL]. arXiv, 2023. http://arXiv.org/abs/2302.12177.
|
[13] |
Chen C, Chen X, Morehead A, et al. 3D-equivariant graph neural networks for protein model quality assessment[J]. Bioinformatics, 2023, 39(1): btad030. doi: 10.1093/bioinformatics/btad030
|
[14] |
MohammadiS, O’Dowd B, Paulitz-Erdmann C, et al. Penalized variational autoencoder for molecular design[EB/OL]. ChemRxiv, 2019. https://ChemRxiv.org/engage/ChemRxiv/article-details/60c74169f96a0012ee286438.
|
[15] |
Prokhorov V, Shareghi E, Li YZ, et al. On the importance of the kullback-leibler divergence term in variational autoencoders for text generation[C]//Proceedings of the 3rd Workshop on Neural Generation and Translation. Hong Kong, China. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 118–127.
|
[16] |
Choi J, Seo S, Choi S, et al. ReBADD-SE: multi-objective molecular optimisation using SELFIES fragment and off-policy self-critical sequence training[J]. Comput Biol Med, 2023, 157: 106721. doi: 10.1016/j.compbiomed.2023.106721
|
[17] |
He DK, Liu Q, Mi Y, et al. De novo generation and identification of novel compounds with drug efficacy based on machine learning[J]. Adv Sci, 2024, 11(11): e2307245.
|
[18] |
Kutsal M, Ucar F, Kati ND. Computational drug discovery on human immunodeficiency virus with a customized long short-term memory variational autoencoder deep-learning architecture[J]. CPT Pharmacometrics Syst Pharmacol, 2024, 13(2): 308-316. doi: 10.1002/psp4.13085
|
[19] |
Bian YM, Wang JM, Jun JJ, et al. Deep convolutional generative adversarial network (dcGAN) models for screening and design of small molecules targeting cannabinoid receptors[J]. Mol Pharm, 2019, 16(11): 4451-4460. doi: 10.1021/acs.molpharmaceut.9b00500
|
[20] |
Bickerton GR, Paolini GV, Besnard J, et al. Quantifying the chemical beauty of drugs[J]. Nat Chem, 2012, 4(2): 90-98. doi: 10.1038/nchem.1243
|
[21] |
Weng GQ, Zhao HF, Nie D, et al. RediscMol: benchmarking molecular generation models in biological properties[J]. J Med Chem, 2024, 67(2): 1533-1543. doi: 10.1021/acs.jmedchem.3c02051
|
[22] |
Handa K, Thomas MC, Kageyama M, et al. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data[J]. J Cheminform, 2023, 15(1): 112. doi: 10.1186/s13321-023-00781-1
|
[23] |
Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions[J]. J Cheminform, 2009, 1(1): 8. doi: 10.1186/1758-2946-1-8
|
[24] |
Thakkar A, Chadimová V, Bjerrum EJ, et al. Retrosynthetic accessibility score (RAscore) - rapid machine learned synthesizability classification from AI driven retrosynthetic planning[J]. Chem Sci, 2021, 12(9): 3339-3349. doi: 10.1039/D0SC05401A
|
[25] |
Wang SH, Wang L, Li FL, et al. DeepSA: a deep-learning driven predictor of compound synthesis accessibility[J]. J Cheminform, 2023, 15(1): 103. doi: 10.1186/s13321-023-00771-3
|
[26] |
Krzyzanowski A, Pahl A, Grigalunas M, et al. Spacial Score─A comprehensive topological indicator for small-molecule complexity[J]. J Med Chem, 2023, 66(18): 12739-12750. doi: 10.1021/acs.jmedchem.3c00689
|
[27] |
Preuer K, Renz P, Unterthiner T, et al. Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery[J]. J Chem Inf Model, 2018, 58(9): 1736-1741. doi: 10.1021/acs.jcim.8b00234
|
[28] |
Moret M, Grisoni F, Katzberger P, et al. Perplexity-based molecule ranking and bias estimation of chemical language models[J]. J Chem Inf Model, 2022, 62(5): 1199-1206. doi: 10.1021/acs.jcim.2c00079
|
[29] |
Guo J, Fialková V, Arango JD, et al. Improving de novo molecular design with curriculum learning[J]. Nat Mach Intell, 2022, 4: 555-563. doi: 10.1038/s42256-022-00494-4
|
[30] |
Zhang O, Zhang JT, Jin JY, et al. ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling[J]. Nat Mach Intell, 2023, 5: 1020-1030. doi: 10.1038/s42256-023-00712-7
|
[31] |
Zhang O, Wang TY, Weng GQ, et al. Learning on topological surface and geometric structure for 3D molecular generation[J]. Nat Comput Sci, 2023, 3(10): 849-859. doi: 10.1038/s43588-023-00530-2
|
[32] |
Mokaya M, Imrie F, van Hoorn WP, et al. Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning[J]. Nat Mach Intell, 2023, 5: 386-394. doi: 10.1038/s42256-023-00636-2
|
[33] |
Moret M, Pachon Angona I, Cotos L, et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design[J]. Nat Commun, 2023, 14(1): 114. doi: 10.1038/s41467-022-35692-6
|
[34] |
Qian H, Huang WJ, Tu SK, et al. KGDiff: towards explainable target-aware molecule generation with knowledge guidance[J]. Brief Bioinform, 2023, 25(1): bbad435. doi: 10.1093/bib/bbad435
|
[35] |
Xu MY, Chen HM. Tree-invent: a novel multipurpose molecular generative model constrained with a topological tree[J]. J Chem Inf Model, 2023, 63(22): 7067-7082. doi: 10.1021/acs.jcim.3c01626
|
[36] |
Lim J, Hwang SY, Moon S, et al. Scaffold-based molecular design with a graph generative model[J]. Chem Sci, 2019, 11(4): 1153-1164.
|
[37] |
Hu LZ, Yang YY, Zheng SJ, et al. Kinase inhibitor scaffold hopping with deep learning approaches[J]. J Chem Inf Model, 2021, 61(10): 4900-4912. doi: 10.1021/acs.jcim.1c00608
|
[38] |
Zheng SJ, Lei ZR, Ai HT, et al. Deep scaffold hopping with multimodal transformer neural networks[J]. J Cheminform, 2021, 13(1): 87. doi: 10.1186/s13321-021-00565-5
|
[39] |
Fialková V, Zhao JX, Papadopoulos K, et al. LibINVENT: reaction-based generative scaffold decoration for in silico library design[J]. J Chem Inf Model, 2022, 62(9): 2046-2063. doi: 10.1021/acs.jcim.1c00469
|
[40] |
Loeffler HH, He JZ, Tibo A, et al. Reinvent 4: modern AI-driven generative molecule design[J]. J Cheminform, 2024, 16(1): 20. doi: 10.1186/s13321-024-00812-5
|
[41] |
Liao ZR, Xie L, Mamitsuka H, et al. Sc2Mol: a scaffold-based two-step molecule generator with variational autoencoder and transformer[J]. Bioinformatics, 2023, 39(1): btac814. doi: 10.1093/bioinformatics/btac814
|
[42] |
Liu XH, Ye K, van Vlijmen HWT, et al. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning[J]. J Cheminform, 2023, 15(1): 24. doi: 10.1186/s13321-023-00694-z
|
[43] |
Xu C, Liu RD, Huang SH, et al. 3D-SMGE: a pipeline for scaffold-based molecular generation and evaluation[J]. Brief Bioinform, 2023, 24(6): bbad327. doi: 10.1093/bib/bbad327
|
[44] |
Hu C, Li S, Yang CX, et al. ScaffoldGVAE: scaffold generation and hopping of drug molecules via a variational autoencoder based on multi-view graph neural networks[J]. J Cheminform, 2023, 15(1): 91. doi: 10.1186/s13321-023-00766-0
|
[45] |
Xie JJ, Chen S, Lei JP, et al. DiffDec: structure-aware scaffold decoration with an end-to-end diffusion model[J]. J Chem Inf Model, 2024, 64(7): 2554-2564. doi: 10.1021/acs.jcim.3c01466
|
[46] |
Bilsland AE, McAulay K, West R, et al. Automated generation of novel fragments using screening data, a dual SMILES autoencoder, transfer learning and syntax correction[J]. J Chem Inf Model, 2021, 61(6): 2547-2559. doi: 10.1021/acs.jcim.0c01226
|
[47] |
Hadfield TE, Imrie F, Merritt A, et al. Incorporating target-specific pharmacophoric information into deep generative models for fragment elaboration[J]. J Chem Inf Model, 2022, 62(10): 2280-2292. doi: 10.1021/acs.jcim.1c01311
|
[48] |
Du HY, Jiang DJ, Zhang O, et al. A flexible data-free framework for structure-based de novo drug design with reinforcement learning[J]. Chem Sci, 2023, 14(43): 12166-12181. doi: 10.1039/D3SC04091G
|
[49] |
Powers AS, Yu HH, Suriana P, et al. Geometric deep learning for structure-based ligand design[J]. ACS Cent Sci, 2023, 9(12): 2257-2267. doi: 10.1021/acscentsci.3c00572
|
[50] |
Sauer S, Matter H, Hessler G, et al. Integrating reaction schemes, reagent databases, and virtual libraries into fragment-based design by reinforcement learning[J]. J Chem Inf Model, 2023, 63(18): 5709-5726. doi: 10.1021/acs.jcim.3c00735
|
[51] |
Wang JK, Zeng YD, Sun HY, et al. Molecular generation with reduced labeling through constraint architecture[J]. J Chem Inf Model, 2023, 63(11): 3319-3327. doi: 10.1021/acs.jcim.3c00579
|
[52] |
Eguida M, Schmitt-Valencia C, Hibert M, et al. Target-focused library design by pocket-applied computer vision and fragment deep generative linking[J]. J Med Chem, 2022, 65(20): 13771-13783. doi: 10.1021/acs.jmedchem.2c00931
|
[53] |
Buehler Y, Reymond JL. Expanding bioactive fragment space with the generated database GDB-13s[J]. J Chem Inf Model, 2023, 63(20): 6239-6248. doi: 10.1021/acs.jcim.3c01096
|
[54] |
Diao YY, Hu F, Shen ZH, et al. MacFrag: segmenting large-scale molecules to obtain diverse fragments with high qualities[J]. Bioinformatics, 2023, 39(1): btad012. doi: 10.1093/bioinformatics/btad012
|
[55] |
Imrie F, Bradley AR, van der Schaar M, et al. Deep generative models for 3D linker design[J]. J Chem Inf Model, 2020, 60(4): 1983-1995. doi: 10.1021/acs.jcim.9b01120
|
[56] |
Yang YY, Zheng SJ, Su SM, et al. SyntaLinker: automatic fragment linking with deep conditional transformer neural networks[J]. Chem Sci, 2020, 11(31): 8312-8322. doi: 10.1039/D0SC03126G
|
[57] |
Tan YH, Dai LX, Huang WF, et al. DRlinker: deep reinforcement learning for optimization in fragment linking design[J]. J Chem Inf Model, 2022, 62(23): 5907-5917. doi: 10.1021/acs.jcim.2c00982
|
[58] |
Li BQ, Ran T, Chen HM. 3D based generative PROTAC linker design with reinforcement learning[J]. Brief Bioinform, 2023, 24(5): bbad323. doi: 10.1093/bib/bbad323
|
[59] |
Kao CT, Lin CT, Chou CL, et al. Fragment linker prediction using the deep encoder-decoder network for PROTACs drug design[J]. J Chem Inf Model, 2023, 63(10): 2918-2927. doi: 10.1021/acs.jcim.2c01287
|
[60] |
Zhang H, Huang JC, Xie JJ, et al. GRELinker: a graph-based generative model for molecular linker design with reinforcement and curriculum learning[J]. J Chem Inf Model, 2024, 64(3): 666-676. doi: 10.1021/acs.jcim.3c01700
|
[61] |
Imrie F, Hadfield TE, Bradley AR, et al. Deep generative design with 3D pharmacophoric constraints[J]. Chem Sci, 2021, 12(43): 14577-14589. doi: 10.1039/D1SC02436A
|
[62] |
Zhu HM, Zhou RY, Cao DS, et al. A pharmacophore-guided deep learning approach for bioactive molecular generation[J]. Nat Commun, 2023, 14(1): 6234. doi: 10.1038/s41467-023-41454-9
|
[63] |
Dahal S, Yurkovich JT, Xu H, et al. Synthesizing systems biology knowledge from omics using genome-scale models[J]. Proteomics, 2020, 20(17/18): e1900282.
|
[64] |
Born J, Manica M, Oskooei A, et al. PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning[J]. iScience, 2021, 24(4): 102269. doi: 10.1016/j.isci.2021.102269
|
[65] |
Pravalphruekul N, Piriyajitakonkij M, Phunchongharn P, et al. De novo design of molecules with multiaction potential from differential gene expression using variational autoencoder[J]. J Chem Inf Model, 2023, 63(13): 3999-4011. doi: 10.1021/acs.jcim.3c00355
|
[66] |
Das D, Chakrabarty B, Srinivasan R, et al. Gex2SGen: designing drug-like molecules from desired gene expression signatures[J]. J Chem Inf Model, 2023, 63(7): 1882-1893. doi: 10.1021/acs.jcim.2c01301
|
[67] |
Dolfus U, Briem H, Rarey M. Synthesis-aware generation of structural analogues[J]. J Chem Inf Model, 2022, 62(15): 3565-3576. doi: 10.1021/acs.jcim.2c00246
|
[68] |
Qiang B, Zhou YR, Ding YH, et al. Bridging the gap between chemical reaction pretraining and conditional molecule generation with a unified model[J]. Nat Mach Intell, 2023, 5: 1476-1485. doi: 10.1038/s42256-023-00764-9
|
[69] |
Khemchandani Y, O’Hagan S, Samanta S, et al. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach[J]. J Cheminform, 2020, 12(1): 53. doi: 10.1186/s13321-020-00454-3
|
[70] |
Lamanna G, Delre P, Marcou G, et al. GENERA: a combined genetic/deep-learning algorithm for multiobjective target-oriented de novo design[J]. J Chem Inf Model, 2023, 63(16): 5107-5119. doi: 10.1021/acs.jcim.3c00963
|
[71] |
Jayatunga MKP, Xie W, Ruder L, et al. AI in small-molecule drug discovery: a coming wave[J]? Nat Rev Drug Discov, 2022, 21(3): 175-176. doi: 10.1038/d41573-022-00025-1
|
[72] |
Lv Q, Zhou FL, Liu XH, et al. Artificial intelligence in small molecule drug discovery from 2018 to 2023: does it really work[J]? Bioorg Chem, 2023, 141: 106894. doi: 10.1016/j.bioorg.2023.106894
|
[73] |
Ren F, Aliper A, Chen J, et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models[J]. Nat Biotechnol, 2024: 1-13
|
[74] |
Li YG, Liu YT, Wu JP, et al. Discovery of potent, selective, and orally bioavailable small-molecule inhibitors of CDK8 for the treatment of cancer[J]. J Med Chem, 2023, 66(8): 5439-5452. doi: 10.1021/acs.jmedchem.2c01718
|
[75] |
Swanson K, Liu G, Catacutan DB, et al. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics[J]. Nat Mach Intell, 2024, 6: 338-353. doi: 10.1038/s42256-024-00809-7
|
[1] | ZHANG Zhixing, DENG Hua, TANG Yun. Applications and challenges of artificial intelligence in the development of anticancer peptides[J]. Journal of China Pharmaceutical University, 2024, 55(3): 347-356. DOI: 10.11665/j.issn.1000-5048.2024040201 |
[2] | HU Zi’ang, GAO Liming, YU Wenying. Advances in the application of artificial intelligence in nucleic acid drug development[J]. Journal of China Pharmaceutical University, 2024, 55(3): 335-346. DOI: 10.11665/j.issn.1000-5048.2024033101 |
[3] | ZENG Hao, WU Guozhen, ZOU Wuxin, WANG Zhe, SONG Jianfei, SHI Hui, WANG Xiaojian, HOU Tingjun, DENG Yafeng. Optimization of Menin inhibitors based on artificial intelligence-driven molecular factory technology[J]. Journal of China Pharmaceutical University, 2024, 55(3): 326-334. DOI: 10.11665/j.issn.1000-5048.2024040904 |
[4] | CHEN Baiyu, LYU Lunan, XU Xiaodi, ZHANG Ying, LI Wei, FU Wei. Reflections on improving drug success rates with AIDD and CADD[J]. Journal of China Pharmaceutical University, 2024, 55(3): 284-294. DOI: 10.11665/j.issn.1000-5048.2024011302 |
[5] | PU Chengtao, GU Lingqian, CHEN Xingye, ZHANG Yanmin. Prediction of human intestinal absorption properties based on artificial intelligence[J]. Journal of China Pharmaceutical University, 2023, 54(3): 355-362. DOI: 10.11665/j.issn.1000-5048.2023032102 |
[6] | XUE Feng, FENG Shuo, LI Jing. Application and prospect of artificial intelligence in antimicrobial peptides screening[J]. Journal of China Pharmaceutical University, 2023, 54(3): 314-322. DOI: 10.11665/j.issn.1000-5048.2023030901 |
[7] | GU Zhihao, GUO Wenhao, YAO Hequan, LI Xuanyi, LIN Kejiang. Research progress of the screening and generation of lead compounds based on artificial intelligence model[J]. Journal of China Pharmaceutical University, 2023, 54(3): 294-304. DOI: 10.11665/j.issn.1000-5048.2023042201 |
[8] | YU Zehao, ZHANG Leiming, ZHANG Mengna, DAI Zhiqi, PENG Chengbin, ZHENG Siming. Artificial intelligence-based drug development: current progress and future challenges[J]. Journal of China Pharmaceutical University, 2023, 54(3): 282-293. DOI: 10.11665/j.issn.1000-5048.2023041003 |
[9] | WANG Chao, XIAO Fu, LI Miaozhu, PAN Ying, DING Xiao, REN Feng, ZHAVORONKOV Alex, WANG Yazhou. Application progress of artificial intelligence in the screening and identification of drug targets[J]. Journal of China Pharmaceutical University, 2023, 54(3): 269-281. DOI: 10.11665/j.issn.1000-5048.2023041102 |
[10] | YAN Fangrong. Application and advance of artificial intelligence in biomedical field[J]. Journal of China Pharmaceutical University, 2023, 54(3): 263-268. DOI: 10.11665/j.issn.1000-5048.2023030304 |