tailieunhanh - Báo cáo khoa học: " The Development of Lexical Resources for Information Extraction from Text Combining Word Net and Dewey Decimal Classification"

Lexicon definition is one of the main bottlenecks in the development of new applications in the field of Information Extraction from text. Generic resources (., lexical databases) are promising for reducing the cost of specific lexica definition, but they introduce lexical ambiguity. This paper proposes a methodology for building application-specific lexica by using WordNet. Lexical ambiguity is kept under control by marking synsets in WordNet with field labels taken from the Dewey Decimal Classification. tion requirement. Unfortunately one of the current trends in IE is the progressive reduction of the size of training corpora: ., from the 1,000 texts. | Proceedings of EACL 99 The Development of Lexical Resources for Information Extraction from Text Combining WordNet and Dewey Decimal Classification Gabriela Cavaglià ITC-irst Centro per la Ricerca Scientifica e Tecnologica via Sommarive 18 38050 Povo TN Italy e-mail cavaglia@ Abstract Lexicon definition is one of the main bottlenecks in the development of new applications in the field of Information Extraction from text. Generic resources . lexical databases are promising for reducing the cost of specific lexica definition but they introduce lexical ambiguity. This paper proposes a methodology for building application-specific lexica by using WordNet. Lexical ambiguity is kept under control by marking synsets in WordNet with field labels taken from the Dewey Decimal Classification. 1 Introduction One of the current issues in Information Extraction IE is efficient transportability as the cost of new applications is one of the factors limiting the market. The lexicon definition process is currently one of the main bottlenecks in producing applications. As a matter of fact the necessary lexicon for an average application is generally large hundreds to thousands of words and most lexical information is not transportable across domains. The problem of lexicon transport is worsened by the growing degree of lexicalization of IE systems nowadays several successful systems adopt lexical rules at many levels. The IE research mainstream focused essentially on the definition of lexica starting from a corpus sample Riloff 1993 Grishman 1997 with the implicit assumption that a corpus provided for an application is representative of the whole applica This work was carried on at ITC-IRST cis part of the author s dissertation for the degree in Philosophy University of Turin supervisor Carla Bazzanella . The author wants to thank her supervisor at ITC-IRST Fabio Ciravegna for his constant help. Alberto Lavelli provided valuable comments to the paper. tion requirement.

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.