tailieunhanh - Báo cáo khoa học: "Automatic Classification of Verbs in Biomedical Texts"

Lexical classes, when tailored to the application and domain in question, can provide an effective means to deal with a number of natural language processing (NLP) tasks. While manual construction of such classes is difficult, recent research shows that it is possible to automatically induce verb classes from cross-domain corpora with promising accuracy. We report a novel experiment where similar technology is applied to the important, challenging domain of biomedicine. We show that the resulting classification, acquired from a corpus of biomedical journal articles, is highly accurate and strongly domainspecific. . | Automatic Classification of Verbs in Biomedical Texts Anna Korhonen University of Cambridge Computer Laboratory 15 JJ Thomson Avenue Cambridge CB3 0GD UK alk23@ Yuval Krymolowski Nigel Collier Dept. of Computer Science National Institute of Informatics Technion Hitotsubashi 2-1-2 Haifa 32000 Chiyoda-ku Tokyo 101-8430 Israel Japan yuvalkr@ collier@ Abstract Lexical classes when tailored to the application and domain in question can provide an effective means to deal with a number of natural language processing nlp tasks. While manual construction of such classes is difficult recent research shows that it is possible to automatically induce verb classes from cross-domain corpora with promising accuracy. We report a novel experiment where similar technology is applied to the important challenging domain of biomedicine. We show that the resulting classification acquired from a corpus of biomedical journal articles is highly accurate and strongly domainspecific. It can be used to aid BIO-NLP directly or as useful material for investigating the syntax and semantics of verbs in biomedical texts. 1 Introduction Lexical classes which capture the close relation between the syntax and semantics of verbs have attracted considerable interest in NLP Jackendoff 1990 Levin 1993 Dorr 1997 Prescher et al. 2000 . Such classes are useful for their ability to capture generalizations about a range of linguistic properties. For example verbs which share the meaning of manner of motion such as travel run walk behave similarly also in terms of subcategorization I traveled ran walked I trav-eled ran walked to London I traveled ran walked five miles . Although the correspondence between the syntax and semantics of words is not perfect and the classes do not provide means for full semantic inferencing their predictive power is nevertheless considerable. NLP systems can benefit from lexical classes in many ways. Such classes define the mapping from surface