• 中国中文核心期刊
  • 中国科学引文数据库核心期刊
  • 中国科技核心期刊
  • 中国高校百佳科技期刊
高级检索

基于深度学习和多种机器学习算法预测人体细胞色素P450抑制剂活性

林明德, 韩伟杰, 徐小贺, 戴晓雯, 陈亚东

林明德, 韩伟杰, 徐小贺, 戴晓雯, 陈亚东. 基于深度学习和多种机器学习算法预测人体细胞色素P450抑制剂活性[J]. 中国药科大学学报, 2023, 54(3): 333-343. DOI: 10.11665/j.issn.1000-5048.2023033103
引用本文: 林明德, 韩伟杰, 徐小贺, 戴晓雯, 陈亚东. 基于深度学习和多种机器学习算法预测人体细胞色素P450抑制剂活性[J]. 中国药科大学学报, 2023, 54(3): 333-343. DOI: 10.11665/j.issn.1000-5048.2023033103
LIN Mingde, HAN Weijie, XU Xiaohe, DAI Xiaowen, CHEN Yadong. Activity prediction of human cytochrome P450 inhibitors based on multiple deep learning and machine learning methods[J]. Journal of China Pharmaceutical University, 2023, 54(3): 333-343. DOI: 10.11665/j.issn.1000-5048.2023033103
Citation: LIN Mingde, HAN Weijie, XU Xiaohe, DAI Xiaowen, CHEN Yadong. Activity prediction of human cytochrome P450 inhibitors based on multiple deep learning and machine learning methods[J]. Journal of China Pharmaceutical University, 2023, 54(3): 333-343. DOI: 10.11665/j.issn.1000-5048.2023033103

基于深度学习和多种机器学习算法预测人体细胞色素P450抑制剂活性

Activity prediction of human cytochrome P450 inhibitors based on multiple deep learning and machine learning methods

  • 摘要: 人体细胞色素P450(CYP)受到抑制会导致药物-药物相互作用,从而产生严重的不良反应。因此,准确预测给定化合物对特定CYP亚型的抑制能力至关重要。本研究基于不同的分子表征,比较了11种机器学习方法和2种深度学习模型,实验结果表明,基于RDKit_2d + Morgan的CatBoost机器学习模型在准确率和马修斯系数方面优于其他模型,甚至优于先前发表的模型。此外,实验结果还显示,CatBoost模型不仅性能佳,而且计算资源消耗较低。最后,本文将表现较好的前3名模型结合为co_model,其在性能方面稍微优于单独使用CatBoost模型。
    Abstract: Inhibition of human cytochrome P450 (CYP) can lead to drug-drug interactions, resulting in serious adverse reactions.It is therefore crucial to accurately predict the inhibitory power of a given compound against a particular CYP isoform.This study compared 11 machine learning methods and 2 deep learning models based on different molecular representations.The experimental results showed that the CatBoost machine learning model based on RDKit_2d+Morgan outperformed other models in terms of accuracy and Mathews coefficient, and even outperformed previously published models.Moreover, the experimental results also showed that the CatBoost model not only had superior performance, but also consumed less computational resources.Finally, this study combined the top 3 performing models as co_model, which slightly outperformed the CatBoost model alone in terms of performance.
  • [1] Evans WE, Relling MV. Pharmacogenomics: translating functional genomics into rational therapeutics[J]. Science, 1999, 286(5439): 487-491.
    [2] Feiters MC, Rowan AE, Nolte R. ChemInform abstract: from simple to supramolecular cytochrome P450 mimics[J]. Chem Soc Rev, 2000, 29(6): 375-384.
    [3] du Souich P. In human therapy, is the drug-drug interaction or the adverse drug reaction the issue[J]? J Can De Pharmacol Clin, 2001, 8(3): 153-161.
    [4] Williams JA, Hyland R, Jones BC, et al. Drug-drug interactions for UDP-glucuronosyltransferase substrates: a pharmacokinetic explanation for typically observed low exposure (AUCi/AUC) ratios[J]. Drug MeTable Dispos, 2004, 32(11): 1201-1208.
    [5] Khakar PS. Two-dimensional (2D) in silico models for absorption, distribution, metabolism, excretion and toxicity (ADME/T) in drug discovery[J]. Curr Top Med Chem, 2010, 10(1): 116-126.
    [6] Dai H, Xu Q, Xiong Y, et al. Improved prediction of Michaelis constants in CYP450-mediated reactions by resilient back propagation algorithm[J]. Curr Drug Metab, 2016, 17(7): 673-680.
    [7] Kato H. Computational prediction of cytochrome P450 inhibition and induction[J]. Drug MeTable Pharmacokinet, 2020, 35(1): 30-44.
    [8] Leach AG, Kidley NJ. Cytochrome P450 substrate recognition and binding[M]// Drug Metabolism Prediction. Weinheim: Wiley-VCH Verlag GmbH & Co. KGaA,2014: 103-132.
    [9] Oostenbrink C. Structure-based methods for predicting the sites and products of metabolism[M]// Drug Metabolism Prediction. Weinheim: Wiley-VCH Verlag GmbH & Co. KGaA,2014: 243-264.
    [10] Kirchmair J, Williamson MJ, Tyzack JD, et al. Computational prediction of metabolism: sites, products, SAR, P450 enzyme dynamics, and mechanisms[J]. J Chem Inf Model, 2012, 52(3): 617-648.
    [11] Shan XQ, Wang XG, Li CD, et al. Prediction of CYP450 enzyme-substrate selectivity based on the network-based label space division method[J]. J Chem Inf Model, 2019, 59(11): 4577-4586.
    [12] Xiong Y, Qiao YH, Kihara D, et al. Survey of machine learning techniques for prediction of the isoform specificity of cytochrome P450 substrates[J]. Curr Drug Metab, 2019, 20(3): 229-235.
    [13] Tyzack JD, Hunt PA, Segall MD. Predicting regioselectivity and lability of cytochrome P450 metabolism using quantum mechanical simulations[J]. J Chem Inf Model, 2016, 56(11): 2180-2193.
    [14] Gleeson MP, Davis AM, Chohan KK, et al. Generation of in-silico cytochrome P450 1A2, 2C9, 2C19, 2D6, and 3A4 inhibition QSAR models[J]. J Comput Aided Mol Des, 2007, 21(10/11): 559-573.
    [15] Cheng FX, Yu Y, Shen J, et al. Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers[J]. J Chem Inf Model, 2011, 51(5): 996-1011.
    [16] Pan XC, Chao L, Qu SJ, et al. An improved large-scale prediction model of CYP1A2 inhibitors by using combined fragment descriptors[J]. RSC Adv, 2015, 5(102): 84232-84237.
    [17] Wu ZX, Lei TL, Shen C, et al. ADMET evaluation in drug discovery. 19. reliable prediction of human cytochrome P450 inhibition using artificial intelligence approaches[J]. J Chem Inf Model, 2019, 59(11): 4587-4601.
    [18] Li X, Xu YJ, Lai LH, et al. Prediction of human cytochrome P450 inhibition using a multitask deep autoencoder neural network[J]. Mol Pharm, 2018, 15(10): 4336-4345.
    [19] Inglese J, Auld DS, Jadhav A, et al. Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries[J]. Proc Natl Acad Sci U S A, 2006, 103(31): 11473-11478.
    [20] Zhao XW, Ma ZQ, Yin MH. Using support vector machine and evolutionary profiles to predict antifreeze protein sequences[J]. Int J Mol Sci, 2012, 13(2): 2196-2207.
    [21] Hu LY, Huang MW, Ke SW, et al. The distance function effect on k-nearest neighbor classification for medical datasets[J]. Springerplus, 2016, 5(1): 1304.
    [22] Tong WD, Hong HX, Fang H, et al. Decision forest: combining the predictions of multiple independent decision tree models[J]. J Chem Inf Comput Sci, 2003, 43(2): 525-531.
    [23] Breiman L. Random Forests[J]. Mach Learn , 2001, 45: 5-32.
    [24] Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient Boosting decision tree[C]// Advances in Neural Information Processing Systems 30. Long Beach:Curran Associates Inc.,2017: 3149-3157.
    [25] Friedman JH. Greedy function approximation: a gradient Boosting machine[J]. Ann Statist, 2001, 29(5): 1189-1232.
    [26] Chen TQ, Guestrin C. XGBoost: a scalable tree Boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2016: 785-794.
    [27] Xing HJ, Liu WT. Robust AdaBoost based ensemble of one-class support vector machines[J]. Inf Fusion, 2020, 55: 45-58.
    [28] Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased Boosting with categorical features[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2018:6639-6649.
    [29] Connelly L. Logistic regression[J]. Med Surg Nurs, 2020, 29(5): 353-354.
    [30] Geurts P, Ernst D, Wehenkel L. Extremely randomized trees[J]. Mach Learn, 2006, 63(1): 3-42.
    [31] Moon T, Chi MH, Kim DH, et al. Quantitative structure-activity relationships (QSAR) study of flavonoid derivatives for inhibition of cytochrome P450 1A2[J]. Quant Struct Act Relatio, 2000, 19(3): 257-263.
    [32] Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation[J]. arXiv,2020:010.16061.
    [33] Vergara JR, Estévez PA. A review of feature selection methods based on mutual information[J]. Neural Comput Applic, 2014, 24(1): 175-186.
    [34] Bachman P, Hjelm RD, Buchwalter W. Learning representations by maximizing mutual information across views[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, New York: Curran Associates Inc, 2019:15535-15545.
    [35] Kwak N, Choi CH. Input feature selection by mutual information based on Parzen window[J]. IEEE Trans Pattern Anal Mach Intell, 2002, 24(12): 1667-1671.
    [36] Cai CP, Guo PF, Zhou YD, et al. Deep learning-based prediction of drug-induced cardiotoxicity[J]. J Chem Inf Model, 2019, 59(3): 1073-1084.
    [37] Xing GM, Liang L, Deng CL, et al. Activity prediction of small molecule inhibitors for antirheumatoid arthritis targets based on artificial intelligence[J]. ACS Comb Sci, 2020, 22(12): 873-886.
    [38] Su BH, Tu YS, Lin C, et al. Rule-based prediction models of cytochrome P450 inhibition[J]. J Chem Inf Model, 2015, 55(7): 1426-1434.
    [39] Sun HM, Veith H, Xia MH, et al. Predictive models for cytochrome P450 isozymes based on quantitative high throughput screening data[J]. J Chem Inf Model, 2011, 51(10): 2474-2481.
  • 期刊类型引用(1)

    1. 王晓雷,王钱庆,王鲜芳. 基于Flask数据可视化的网页端显示方法研究. 无线互联科技. 2024(15): 10-13+20 . 百度学术

    其他类型引用(0)

计量
  • 文章访问数:  201
  • HTML全文浏览量:  23
  • PDF下载量:  355
  • 被引次数: 1
出版历程
  • 收稿日期:  2023-03-30
  • 修回日期:  2023-06-12
  • 刊出日期:  2023-06-24

目录

    /

    返回文章
    返回