tailieunhanh - Báo cáo khoa học: "Distributional Identification of Non-Referential Pronouns"

We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead nonreferential. We extract the surrounding textual context of the pronoun and gather, from a large corpus, the distribution of words that occur within that context. We learn to reliably classify these distributions as representing either referential or non-referential pronoun instances. Despite its simplicity, experimental results on classifying the English pronoun it show the system achieves the highest performance yet attained on this important task. i. | Distributional Identification of Non-Referential Pronouns Shane Bergsma Dekang Lin Randy Goebel Department of Computing Science Google Inc. Department of Computing Science University of Alberta Edmonton Alberta Canada T6G 2E8 bergsma@ 1600 Amphitheatre Parkway Mountain View California 94301 lindek@ University of Alberta Edmonton Alberta Canada T6G 2E8 goebel@ Abstract We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead non-referential. We extract the surrounding textual context of the pronoun and gather from a large corpus the distribution of words that occur within that context. We learn to reliably classify these distributions as representing either referential or non-referential pronoun instances. Despite its simplicity experimental results on classifying the English pronoun it show the system achieves the highest performance yet attained on this important task. 1 Introduction The goal of coreference resolution is to determine which noun phrases in a document refer to the same real-world entity. As part of this task coreference resolution systems must decide which pronouns refer to preceding noun phrases called antecedents and which do not. In particular a long-standing challenge has been to correctly classify instances of the English pronoun it. Consider the sentences 1 You can make it in advance. 2 You can make it in Hollywood. In sentence 1 it is an anaphoric pronoun referring to some previous noun phrase like the sauce or an appointment. In sentence 2 it is part of the idiomatic expression make it meaning succeed. A coreference resolution system should find an antecedent for the first it but not the second. Pronouns that do not refer to preceding noun phrases are called non-anaphoric or non-referential pronouns. The word it is one of the most frequent words in the English language accounting for about 1 of tokens in text and over a quarter of all .