tailieunhanh - Báo cáo khoa học: "High Throughput Modularized NLP System for Clinical Text"

This paper presents the results of the development of a high throughput, real time modularized text analysis and information retrieval system that identifies clinically relevant entities in clinical notes, maps the entities to several standardized nomenclatures and makes them available for subsequent information retrieval and data mining. The performance of the system was validated on a small collection of 351 documents partitioned into 4 query topics and manually examined by 3 physicians and 3 nurse abstractors for relevance to the query topics. We find that simple key phrase searching results in 73% recall and 77% precision. . | High Throughput Modularized NLP System for Clinical Text Serguei Pakhomov Mayo College of Medicine Mayo Clinic Rochester MN 55905 pakhomov@ James Buntrock Division of Biomedical Informatics Mayo Clinic Rochester MN 55905 Buntrock@ Patrick Duffy Division of Biomedical Informatics Mayo Clinic Rochester MN 55905 duffp@ Abstract This paper presents the results of the development of a high throughput real time modularized text analysis and information retrieval system that identifies clinically relevant entities in clinical notes maps the entities to several standardized nomenclatures and makes them available for subsequent information retrieval and data mining. The performance of the system was validated on a small collection of 351 documents partitioned into 4 query topics and manually examined by 3 physicians and 3 nurse abstractors for relevance to the query topics. We find that simple key phrase searching results in 73 recall and 77 precision. A combination of NLP approaches to indexing improve the recall to 92 while lowering the precision to 67 . 1 Introduction Until recently the NLP systems developed for processing clinical texts have been narrowly focused on a specific type of document such as radiology reports 1 discharge summaries 2 medline abstracts 3 pathology reports 4 . In addition to being developed for a specific task these systems tend to fairly monolithic in that their components have fairly strict dependencies on each other which make plug-and-play functionality difficult. NLP researchers and systems developers in the field realize that modularized approaches are beneficial for component reuse and more rapid development and advancement of NLP technology. In addition to the issue of modularity the NLP systems development efforts are starting to take scal ability into account. The Mayo Clinic s repository of clinical notes contains over 16 million documents growing at the rate of 50K documents per week. The time and space .