tailieunhanh - Báo cáo khoa học: "Hedge classification in biomedical texts with a weakly supervised selection of keywords"

Since facts or statements in a hedge or negated context typically appear as false positives, the proper handling of these language phenomena is of great importance in biomedical text mining. In this paper we demonstrate the importance of hedge classification experimentally in two real life scenarios, namely the ICD9-CM coding of radiology reports and gene name Entity Extraction from scientific texts. We analysed the major differences of speculative language in these tasks and developed a maxent-based solution for both the free text and scientific text processing tasks. . | Hedge classification in biomedical texts with a weakly supervised selection of keywords Gyorgy Szarvas Research Group on Artificial Intelligence Hungarian Academy of Sciences University of Szeged HU-6720 Szeged Hungary szarvas@ Abstract Since facts or statements in a hedge or negated context typically appear as false positives the proper handling of these language phenomena is of great importance in biomedical text mining. In this paper we demonstrate the importance of hedge classification experimentally in two real life scenarios namely the ICD-9-CM coding of radiology reports and gene name Entity Extraction from scientific texts. We analysed the major differences of speculative language in these tasks and developed a maxent-based solution for both the free text and scientific text processing tasks. Based on our results we draw conclusions on the possible ways of tackling speculative language in biomedical texts. 1 Introduction The highly accurate identification of several regularly occurring language phenomena like the speculative use of language negation and past tense temporal resolution is a prerequisite for the efficient processing of biomedical texts. In various natural language processing tasks relevant statements appearing in a speculative context are treated as false positives. Hedge detection seeks to perform a kind of semantic filtering of texts that is it tries to separate factual statements from speculative uncertain ones. Hedging in biomedical NLP To demonstrate the detrimental effects of speculative language on biomedical NLP tasks we will consider two inherently different sample tasks namely the ICD-9-CM coding of radiology records and gene information extraction from biomedical scientific texts. The general features of texts used in these tasks differ significantly from each other but both tasks require the exclusion of uncertain or speculative items from processing. Gene Name and interaction extraction from scientific .