tailieunhanh - Báo cáo khoa học: "Automatic construction of a hypernym-labeled noun hierarchy from text"

Previous work has shown that automatic methods can be used in building semantic lexicons. This work goes a step further by automatically creating not just clusters of related words, but a hierarchy of nouns and their hypernyms, akin to the hand-built hierarchy in WordNet. | Automatic construction of a hypernym-labeled noun hierarchy from text Sharon A. Caraballo Dept of Computer Science Brown University Providence RI 02912 sc@ Abstract Previous work has shown that automatic methods can be used in building semantic lexicons. This work goes a step further by automatically creating not just clusters of related words but a hierarchy of nouns and their hypernyms akin to the hand-built hierarchy in WordNet. 1 Introduction The purpose of this work is to build something like the hypernym-labeled noun hierarchy of WordNet Fellbaum 1998 automatically from text using no other lexical resources. WordNet has been an important research tool but it is insufficient for domainspecific text such as that encountered in the MUCs Message Understanding Conferences . Our work develops a labeled hierarchy based on a text corpus. In this project nouns are clustered into a hierarchy using data on conjunctions and ap-positives appearing in the Wall Street Journal. The internal nodes of the resulting tree are then labeled with hypernyms for the nouns clustered underneath them also based on data extracted from the Wall Street Journal. The resulting hierarchy is evaluated by human judges and future research directions are discussed. 2 Building the noun hierarchy The first stage in constructing our hierarchy is to build an unlabeled hierarchy of nouns using bottom-up clustering methods see . Brown et al. 1992 . Nouns are clustered based on conjunction and apposi-tive data collected from the Wall Street Jour nal corpus. Some of the data comes from the parsed files 2-21 of the Wall Street Journal Penn Treebank corpus Marcus et al. 1993 and additional parsed text was obtained by parsing the 1987 Wall Street Journal text using the parser described in Charniak et al. 1998 . From this parsed text we identified all conjunctions of noun phrases . executive vice-president and treasurer or scientific equipment apparatus and disposables and all appositives .