Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "An Open-License Broad Coverage Lexicon"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Broad coverage lexicons for the English language have traditionally been handmade. This approach, while accurate, requires too much human labor. Furthermore, resources contain gaps in coverage, contain specific types of information, or are incompatible with other resources. We believe that the state of open-license technology is such that a comprehensive syntactic lexicon can be automatically compiled. | NULEX An Open-License Broad Coverage Lexicon Clifton J. McFate Northwestern University Evanston IL. USA. c-mcfate@northwestern.edu Kenneth D. Forbus Northwestern University Evanston IL. USA forbus@northwestern.edu Abstract Broad coverage lexicons for the English language have traditionally been handmade. This approach while accurate requires too much human labor. Furthermore resources contain gaps in coverage contain specific types of information or are incompatible with other resources. We believe that the state of open-license technology is such that a comprehensive syntactic lexicon can be automatically compiled. This paper describes the creation of such a lexicon NU-LEX an open-license feature-based lexicon for general purpose parsing that combines WordNet VerbNet and Wiktionary and contains over 100 000 words. NU-LEX was integrated into a bottom up chart parser. We ran the parser through three sets of sentences 50 sentences total from the Simple English Wikipedia and compared its performance to the same parser using Comlex. Both parsers performed almost equally with NU-LEX finding all lex-items for 50 of the sentences and Comlex succeeding for 52 . Furthermore NULEX s shortcomings primarily fell into two categories suggesting future research directions. 1 Introduction While there are many types of parsers available all of them rely on a lexicon of words whether syntactic like Comlex enriched with semantics like WordNet or derived from tagged corpora like the Penn Treebank Macleod et al 1994 Fellbaum 1998 Marcus et al 1993 63 However many of these resources have gaps that the others can fill in. WordNet for example only contains open-class words and it lacks the extensive subcategorization frame and agreement information present in Comlex Miller et al 1993 Macleod et al 1994 . Comlex while syntactically deep doesn t have tagged usage data or semantic groupings Macleod et al 1994 . Furthermore many of these resources do not map to one another or have restricted