Abstract:
Kirsten rat sarcoma viral oncogene homolog (KRAS) gene is one of the most commonly mutated oncogenes. It has been found that KRAS inhibitors have the potential therapeutic effect on cancer patients with this gene mutation. In this study, machine learning was applied to develop a QSAR(quantitative structure-activity relationship) model for KRAS small molecule inhibitors. A total of 1857data points of IC
50 and SMILES(simplified molecular input line entry system) for KRAS inhibitors were collected from three databases: ChEMBL, BindingDB, and PubChem. And nine different classifiers were constructed using three different feature screening methods combined with three machine learning models, namely, random forest, support vector machine, and extreme gradient boosting machine. The results showed that the SVM model combined with mutual information feature selection exhibited the best performance: AUC
test=0.912, ACC
test=0.859, F1
test=0.890. Moreover, it also demonstrated good predictive performance on the external validation set(AUC
Ext=0.944, Recall
Ext=0.856, FPR
Ext=0.111). This study provides a new technical route for KRAS inhibitor screening in natural product databases using artificial intelligence methods.