tailieunhanh - Báo cáo khoa học: "Using linguistic principles to recover empty categories"

This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al., 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. Unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early Government-Binding theory (Chomsky, 1981), the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the. | Using linguistic principles to recover empty categories Richard CAMPBELL Microsoft Research One Microsoft Way Redmond WA 98052 USA richcamp@ Abstract This paper describes an algorithm for detecting empty nodes in the Penn Treebank Marcus et al. 1993 finding their antecedents and assigning them function tags without access to lexical information such as valency. Unlike previous approaches to this task the current method is not corpus-based but rather makes use of the principles of early Government-Binding theory Chomsky 1981 the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson 2002 this approach outperforms previously published approaches on both detection of empty categories and antecedent identification given either annotated input stripped of empty categories or the output of a parser. Some problems with this evaluation metric are noted and an alternative is proposed along with the results. The paper considers the reasons a principlebased approach to this problem should outperform corpus-based approaches and speculates on the possibility of a hybrid approach. 1 Introduction Many recent approaches to parsing . Charniak 2000 have focused on labeled bracketing of the input string ignoring aspects of structure that are not reflected in the string such as phonetically null elements and long-distance dependencies many of which provide important semantic information such as predicate-argument structure. In the Penn Treebank Marcus et al. 1993 null elements or empty categories are used to indicate non-local dependencies discontinuous constituents and certain missing elements. Empty categories are coindexed with their antecedents in the same sentence. In addition if a node has a particular grammatical function such as subject or semantic role such as location it has a function tag indicating that role empty categories may also have function tags. Thus in the sentence below who is coindexed with the empty .