Entity extraction and graph construction based on Chinese medical text

YANG Ye; PEI Lei; HOU Fengzhen

doi:10.11665/j.issn.1000-5048.2023030903

Journal of China Pharmaceutical University > 2023 > 54(3): 363-371. > DOI: 10.11665/j.issn.1000-5048.2023030903

YANG Ye, PEI Lei, HOU Fengzhen. Entity extraction and graph construction based on Chinese medical text[J]. Journal of China Pharmaceutical University, 2023, 54(3): 363-371. DOI: 10.11665/j.issn.1000-5048.2023030903

Citation:

PDF (1380 KB)

Entity extraction and graph construction based on Chinese medical text

Institute of Medical Big Data and Artificial Intelligence, School of Science, China Pharmaceutical University, Nanjing 211198, China

More Information

Received Date: March 08, 2023
Revised Date: June 11, 2023

Graphical Abstract

Abstract

Abstract

Knowledge graph technology has promoted the progress of new drug research and development, but domestic research starts late and domain knowledge is mostly stored in text, resulting in low rate of knowledge graph reuse.Based on multi-source and heterogeneous medical texts, this paper designed a Chinese named entity recognition model based on Bert-wwm-ext pre-training model and also integrated cascade thought, which reduced the complexity of traditional single classification and further improved the efficiency of text recognition.The experimental results showed that the model achieved the best performance with an F1-score of 0.903, a precision of 89.2%, and a recall rate of 91.5% on the self-built dataset.At the same time, the model was applied to the public dataset CCKS2019, and the results showed that the model had better performance and recognition effect.Using this model, this paper constructed a Chinese medical knowledge graph, involving 13 530 entities, 10 939 attributes and 39 247 relationships of them in total.The Chinese medical entity extraction and graph construction method proposed in this paper is expected to help researchers accelerate the new discovery of medical knowledge, and shorten the process of new drug discovery.
- Chinese medical text,
- named entity recognition model,
- Bert-wwm-ext pre-training model,
- cascade thought,
- knowledge graph

FullText(HTML)

References (35)

References

[1]	Mohamed SK, Nová?ek V, Nounu A. Discovering protein drug targets using knowledge graph embeddings[J]. Bioinformatics, 2020, 36(2): 603-610.
[2]	Lukashina N, Kartysheva E, Spjuth O, et al. SimVec: predicting polypharmacy side effects for new drugs[J]. J Cheminform, 2022, 14(1): 49.
[3]	Li ZX. Relocation of Parkinson''s disease drugs based on knowledge graph[J]. Inf Technol (信息技术与信息化), 2022(7): 28-32.
[4]	Wu XD, Sheng SJ, Jiang TT, et al. Huapu-CP:From knowledge graphs to a data central-platform[J]. JAS (自动化学报), 2020(10): 2045-2059.
[5]	Fan YY, Li ZM. Research and application progress of Chinese medical knowledge graph[J]. J Front Comput Sci Technol (计算机科学与探索), 2022, 16(10): 2219-2233.
[6]	Qi GL, Gao H, Wu TX. Research progress of knowledge map[J]. Inf Eng(情报工程), 2017, 3(1): 4-25.
[7]	Ma XG. Knowledge graph construction and application in geosciences: a review[J]. Comput Geosci, 2022, 161: 105082.
[8]	Li ZW, Ding Y, Hua ZY, et al. Knowledge graph completion model based on triplet importance integration[J]. Comput Sci (计算机科学), 2020, 47(11): 231-236.
[9]	Hu JH, Zhao WQ, Fang A. Research on clinical text processing and knowledge discovery method based on medical big data[J]. China Digit Med (中国数字医学), 2020, 15(7): 11-13, 88.
[10]	Guo XY, He TT. A survey of information extraction[J]. Comput Sci (计算机科学), 2015, 42(2): 14-17,38.
[11]	de Aquino Silva R, da Silva L, Dutra ML, et al. An improved NER methodology to the Portuguese language[J]. Mobile Netw Appl, 2021, 26(1): 319-325.
[12]	Liu P, Guo YM, Wang FL, et al. Chinese named entity recognition: the state of the art[J]. Neurocomputing, 2022, 473: 37-53.
[13]	Wu ST, Liu HF, Li DC, et al. Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis[J]. J Am Med Inform Assoc, 2012, 19(e1): e149-e156.
[14]	Friedman C, Alderson PO, Austin JH, et al. A general natural-language text processor for clinical radiology[J]. J Am Med Inform Assoc, 1994, 1(2): 161-174.
[15]	Chiticariu L, Krishnamurthy R, Li YY, et al. Domain adaptation of rule-based annotators for named-entity recognition tasks[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, Massachusetts. New York: ACM, 2010: 1002–1012.
[16]	Eddy SR. Hidden Markov models[J]. Curr Opin Struct Biol, 1996, 6(3): 361-365.
[17]	Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence dat[C]. ICML. New York:Association for Computing Machinery, 2001:282-289.
[18]	Cortes C, Vapnik V. Support-vector networks[J]. Mach Learn, 1995, 20: 273-297.
[19]	Zhang CS, Guo JY, Xian YT, et al. English product named entity recognition based on conditional random field[J]. Comput Sci Eng (计算机工程与科学), 2010, 32 (6): 115-117.
[20]	Elman JL. Finding structure in time[J]. Cogn Sci, 1990, 14(2): 179-211.
[21]	Cai LQ, Zhou ST, Yan X, et al. A stacked BiLSTM neural network based on coattention mechanism for question answering[J]. Comput Intell Neurosci, 2019, 2019: 9543490.
[22]	Xu YS, Li L, Gao HH, et al. Sentiment classification with adversarial learning and attention mechanism[J]. Comput Intell, 2021, 37(2): 774-798.
[23]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all You need[J]. arXiv,2017:1706.03762.
[24]	Devlin J, Chang MW, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv,2018: 1810.04805
[25]	Song YH, Tian SW, Yu L. A method for identifying local drug names in Xinjiang based on BERT-BiLSTM-CRF[J]. Autom Control Comput Sci, 2020, 54(3): 179–190.
[26]	Chen LM, Liu D, Yang JK, et al. Construction and application of COVID-19 infectors activity information knowledge graph[J]. Comput Biol Med, 2022, 148: 105908.
[27]	Xu L, Li JH. Biomedical named entity recognition based on BERT and BiLSTM-CRF[J]. Comput Sci Eng, 2021(10): 1873-1879.
[28]	Hou YT, Abduklimu A, Haridamu A. Research progress of Chinese pre training model[J]. Comput Sci (计算机科学), 2022, 49(7): 148-163.
[29]	Cui YM, Che WX, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Trans Audio Speech Lang Process, 2021, 29: 3504-3514.
[30]	Song SL, Zhang N, Huang HT. Named entity recognition based on conditional random fields[J].Clust Comput, 2019, 22(3): 5195-5206.
[31]	Wei ZP, Su JL, Wang Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 1476-1488.
[32]	Zheng SC, Wang F, Bao HY, et al. Joint extraction of entities and relations based on a novel tagging scheme[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 1227-1236.
[33]	Luque A, Carrasco A, Martín A, et al. The impact of class imbalance in classification performance metrics based on the binary confusion matrix[J]. Pattern Recognit, 2019, 91: 216-231.
[34]	Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks[J]. Inf Process Manag, 2009, 45(4): 427-437.
[35]	Sen S, Mehta A, Ganguli R, et al. Recommendation of influenced products using association rule mining: Neo4j as a case study[J]. SN Comput Sci, 2021, 2(2): 1-17.

Cited By

Get Citation

PDF

XML

Article views (128) PDF downloads (339)

Turn off MathJax

Article Contents

Abstract

References

Entity extraction and graph construction based on Chinese medical text

Abstract

References

Catalog

Export File

Citation

Format

Content