• 中国精品科技期刊
  • 中国高校百佳科技期刊
  • 中国中文核心期刊
  • 中国科学引文数据库核心期刊
Advanced Search
YANG Ye, PEI Lei, HOU Fengzhen. Entity extraction and graph construction based on Chinese medical text[J]. Journal of China Pharmaceutical University, 2023, 54(3): 363-371. DOI: 10.11665/j.issn.1000-5048.2023030903
Citation: YANG Ye, PEI Lei, HOU Fengzhen. Entity extraction and graph construction based on Chinese medical text[J]. Journal of China Pharmaceutical University, 2023, 54(3): 363-371. DOI: 10.11665/j.issn.1000-5048.2023030903

Entity extraction and graph construction based on Chinese medical text

More Information
  • Received Date: March 08, 2023
  • Revised Date: June 11, 2023
  • Knowledge graph technology has promoted the progress of new drug research and development, but domestic research starts late and domain knowledge is mostly stored in text, resulting in low rate of knowledge graph reuse.Based on multi-source and heterogeneous medical texts, this paper designed a Chinese named entity recognition model based on Bert-wwm-ext pre-training model and also integrated cascade thought, which reduced the complexity of traditional single classification and further improved the efficiency of text recognition.The experimental results showed that the model achieved the best performance with an F1-score of 0.903, a precision of 89.2%, and a recall rate of 91.5% on the self-built dataset.At the same time, the model was applied to the public dataset CCKS2019, and the results showed that the model had better performance and recognition effect.Using this model, this paper constructed a Chinese medical knowledge graph, involving 13 530 entities, 10 939 attributes and 39 247 relationships of them in total.The Chinese medical entity extraction and graph construction method proposed in this paper is expected to help researchers accelerate the new discovery of medical knowledge, and shorten the process of new drug discovery.
  • [1]
    Mohamed SK, Nová?ek V, Nounu A. Discovering protein drug targets using knowledge graph embeddings[J]. Bioinformatics, 2020, 36(2): 603-610.
    [2]
    Lukashina N, Kartysheva E, Spjuth O, et al. SimVec: predicting polypharmacy side effects for new drugs[J]. J Cheminform, 2022, 14(1): 49.
    [3]
    Li ZX. Relocation of Parkinson''s disease drugs based on knowledge graph[J]. Inf Technol (信息技术与信息化), 2022(7): 28-32.
    [4]
    Wu XD, Sheng SJ, Jiang TT, et al. Huapu-CP:From knowledge graphs to a data central-platform[J]. JAS (自动化学报), 2020(10): 2045-2059.
    [5]
    Fan YY, Li ZM. Research and application progress of Chinese medical knowledge graph[J]. J Front Comput Sci Technol (计算机科学与探索), 2022, 16(10): 2219-2233.
    [6]
    Qi GL, Gao H, Wu TX. Research progress of knowledge map[J]. Inf Eng(情报工程), 2017, 3(1): 4-25.
    [7]
    Ma XG. Knowledge graph construction and application in geosciences: a review[J]. Comput Geosci, 2022, 161: 105082.
    [8]
    Li ZW, Ding Y, Hua ZY, et al. Knowledge graph completion model based on triplet importance integration[J]. Comput Sci (计算机科学), 2020, 47(11): 231-236.
    [9]
    Hu JH, Zhao WQ, Fang A. Research on clinical text processing and knowledge discovery method based on medical big data[J]. China Digit Med (中国数字医学), 2020, 15(7): 11-13, 88.
    [10]
    Guo XY, He TT. A survey of information extraction[J]. Comput Sci (计算机科学), 2015, 42(2): 14-17,38.
    [11]
    de Aquino Silva R, da Silva L, Dutra ML, et al. An improved NER methodology to the Portuguese language[J]. Mobile Netw Appl, 2021, 26(1): 319-325.
    [12]
    Liu P, Guo YM, Wang FL, et al. Chinese named entity recognition: the state of the art[J]. Neurocomputing, 2022, 473: 37-53.
    [13]
    Wu ST, Liu HF, Li DC, et al. Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis[J]. J Am Med Inform Assoc, 2012, 19(e1): e149-e156.
    [14]
    Friedman C, Alderson PO, Austin JH, et al. A general natural-language text processor for clinical radiology[J]. J Am Med Inform Assoc, 1994, 1(2): 161-174.
    [15]
    Chiticariu L, Krishnamurthy R, Li YY, et al. Domain adaptation of rule-based annotators for named-entity recognition tasks[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, Massachusetts. New York: ACM, 2010: 1002–1012.
    [16]
    Eddy SR. Hidden Markov models[J]. Curr Opin Struct Biol, 1996, 6(3): 361-365.
    [17]
    Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence dat[C]. ICML. New York:Association for Computing Machinery, 2001:282-289.
    [18]
    Cortes C, Vapnik V. Support-vector networks[J]. Mach Learn, 1995, 20: 273-297.
    [19]
    Zhang CS, Guo JY, Xian YT, et al. English product named entity recognition based on conditional random field[J]. Comput Sci Eng (计算机工程与科学), 2010, 32 (6): 115-117.
    [20]
    Elman JL. Finding structure in time[J]. Cogn Sci, 1990, 14(2): 179-211.
    [21]
    Cai LQ, Zhou ST, Yan X, et al. A stacked BiLSTM neural network based on coattention mechanism for question answering[J]. Comput Intell Neurosci, 2019, 2019: 9543490.
    [22]
    Xu YS, Li L, Gao HH, et al. Sentiment classification with adversarial learning and attention mechanism[J]. Comput Intell, 2021, 37(2): 774-798.
    [23]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all You need[J]. arXiv,2017:1706.03762.
    [24]
    Devlin J, Chang MW, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv,2018: 1810.04805
    [25]
    Song YH, Tian SW, Yu L. A method for identifying local drug names in Xinjiang based on BERT-BiLSTM-CRF[J]. Autom Control Comput Sci, 2020, 54(3): 179–190.
    [26]
    Chen LM, Liu D, Yang JK, et al. Construction and application of COVID-19 infectors activity information knowledge graph[J]. Comput Biol Med, 2022, 148: 105908.
    [27]
    Xu L, Li JH. Biomedical named entity recognition based on BERT and BiLSTM-CRF[J]. Comput Sci Eng, 2021(10): 1873-1879.
    [28]
    Hou YT, Abduklimu A, Haridamu A. Research progress of Chinese pre training model[J]. Comput Sci (计算机科学), 2022, 49(7): 148-163.
    [29]
    Cui YM, Che WX, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Trans Audio Speech Lang Process, 2021, 29: 3504-3514.
    [30]
    Song SL, Zhang N, Huang HT. Named entity recognition based on conditional random fields[J].Clust Comput, 2019, 22(3): 5195-5206.
    [31]
    Wei ZP, Su JL, Wang Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 1476-1488.
    [32]
    Zheng SC, Wang F, Bao HY, et al. Joint extraction of entities and relations based on a novel tagging scheme[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 1227-1236.
    [33]
    Luque A, Carrasco A, Martín A, et al. The impact of class imbalance in classification performance metrics based on the binary confusion matrix[J]. Pattern Recognit, 2019, 91: 216-231.
    [34]
    Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks[J]. Inf Process Manag, 2009, 45(4): 427-437.
    [35]
    Sen S, Mehta A, Ganguli R, et al. Recommendation of influenced products using association rule mining: Neo4j as a case study[J]. SN Comput Sci, 2021, 2(2): 1-17.
  • Related Articles

    [1]MENG Yue, YAO Siyuan, GAO Xiangdong, CHEN Song. Effects and mechanisms of SNP-9 on Aβ25-35-induced damage in bEnd.3 cells[J]. Journal of China Pharmaceutical University, 2022, 53(3): 333-339. DOI: 10.11665/j.issn.1000-5048.20220311
    [2]CHEN Yingjie, GAO Xiangdong, CHEN Song. Effects and mechanisms of FGF21 on neuronal damage induced by rotenone[J]. Journal of China Pharmaceutical University, 2020, 51(6): 718-723. DOI: 10.11665/j.issn.1000-5048.20200611
    [3]LI Wei, XU Xuefen. Mechanism of resveratrol induced apoptosis on human prostate cancer cell line DU145[J]. Journal of China Pharmaceutical University, 2018, 49(6): 711-717. DOI: 10.11665/j.issn.1000-5048.20180612
    [4]FENG Quanfu, BI Lei, YAN Xiaojing, YANG Ye, CHEN Weiping. Inhibition of tetramethypyrazine on proliferation of HepG2 cells and its effects on the pathway of mitochondrial apoptosis[J]. Journal of China Pharmaceutical University, 2015, 46(3): 350-354. DOI: 10.11665/j.issn.1000-5048.20150315
    [5]QI Cuiling, ZHOU Xinlei, YE Jie, YANG Yang, ZHANG Qianqian, LI Jiangchao, WANG Lijing. Andrographolide induces Tb cell apoptosis by activating Caspase-3/PARP[J]. Journal of China Pharmaceutical University, 2013, 44(6): 559-562. DOI: 10.11665/j.issn.1000-5048.20130614
    [6]REN Jie, XIN Wenqun, CHEN Xin, HU Kun. Apoptosis induced by podophyllotoxin derivative OAMDP in HeLa cells[J]. Journal of China Pharmaceutical University, 2013, 44(3): 267-271. DOI: 10.11665/j.issn.1000-5048.20130316
    [7]LEI Hui, TAN Jiani, LI Shaoping, LI Haitao, JI Hui. Turmeric oil induces human hepatoma cell apoptosis via mitochondrial pathway[J]. Journal of China Pharmaceutical University, 2013, 44(3): 263-266. DOI: 10.11665/j.issn.1000-5048.20130315
    [8]Effects of panaxatriol saponins on the differentiation and apoptosis of MC3T3-E1 cells[J]. Journal of China Pharmaceutical University, 2010, 41(3): 273-377.
    [9]Mechanism of TNF Related Apoptosis Inducing Ligand Inducing Apoptosis and Its Pharmaceutical Exploitation[J]. Journal of China Pharmaceutical University, 2004, (4): 91-94.
    [10]Effect of Nerve Regeneration Factor on Apoptosis Cells in the Newborn Rat Spinal Cord[J]. Journal of China Pharmaceutical University, 2002, (1): 60-63.
  • Cited by

    Periodical cited type(0)

    Other cited types(2)

Catalog

    Article views (115) PDF downloads (333) Cited by(2)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return