tailieunhanh - Báo cáo khoa học: "Corpus Statistics Meet the Noun Compound : Some Empirical Results"
A variety of statistical methods for noun compound anMysis are implemented and compared. The results support two main conclusions. First, the use of conceptual association not only enables a broad coverage, but also improves the accuracy. Second, an analysis model based on dependency grammar is substantially more accurate than one based on deepest constituents, even though the latter is more prevalent in the literature. | Corpus Statistics Meet the Noun Compound Some Empirical Results Mark Lauer Microsoft Institute 65 Epping Road North Ryde NSW 2113 Australia t-markl Abstract A variety of statistical methods for noun compound analysis are implemented and compared. The results support two main conclusions. First the use of conceptual association not only enables a broad coverage but also improves the accuracy. Second an analysis model based on dependency grammar is substantially more accurate than one based on deepest constituents even though the latter is more prevalent in the literature. 1 Background Compound Nouns If parsing is taken to be the first step in taming the natural language understanding task then broad coverage NLP remains a jungle inhabited by wild beasts. For instance parsing noun compounds appears to require detailed world knowledge that is unavailable outside a limited domain Sparck Jones 1983 . Yet far from being an obscure endangered species the noun compound is flourishing in modern language. It has already made five appearances in this paragraph and at least one diachronic study shows a veritable population explosion Leonard 1984 . While substantial work on noun compounds exists in both linguistics . Levi 1978 Ryder 1994 and computational linguistics Finin 1980 McDonald 1982 Isabelle 1984 techniques suitable for broad coverage parsing remain unavailable. This paper explores the application of corpus statistics Charniak 1993 to noun compound parsing other computational problems are addressed in Arens et al 1987 Vanderwende 1993 and Sproat 1994 . The task is illustrated in example 1 Example 1 a womanN aidN worketN b hydrogenN ionN exchange The parses assigned to these two compounds differ even though the sequence of parts of speech are identical. The problem is analogous to the prepositional phrase attachment task explored in Kindle and Rooth 1993 . The approach they propose involves computing lexical associations from a corpus and using .
đang nạp các trang xem trước