tailieunhanh - Báo cáo khoa học: "Accurate Unlexicalized Parsing"

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of (LP/LR F1 ) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-theart. | Accurate Unlexicalized Parsing Dan Klein Computer Science Department Stanford University Stanford CA 94305-9040 klein@ Christopher D. Manning Computer Science Department Stanford University Stanford CA 94305-9040 manning@ Abstract We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown by making use of simple linguistically motivated state splits which break down false independence assumptions latent in a vanilla treebank grammar. Indeed its performance of LP LR F1 is better than that of early lexicalized PCFG models and surprisingly close to the current state-of-the-art. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models an unlexical-ized PCFG is much more compact easier to replicate and easier to interpret than more complex lexical models and the parsing algorithms are simpler more widely understood of lower asymptotic complexity and easier to optimize. In the early 1990s as probabilistic methods swept NLP parsing work revived the investigation of probabilistic context-free grammars PCFGs Booth and Thomson 1973 Baker 1979 . However early results on the utility of PCFGs for parse disambiguation and language modeling were somewhat disappointing. A conviction arose that lexicalized PCFGs where head words annotate phrasal nodes were the key tool for high performance PCFG parsing. This approach was congruent with the great success of word n-gram models in speech recognition and drew strength from a broader interest in lexicalized grammars as well as demonstrations that lexical dependencies were a key tool for resolving ambiguities such as PP attachments Ford et al. 1982 Hindle and Rooth 1993 . In the following decade great success in terms of parse disambiguation and even language modeling was achieved by various lexicalized PCFG models Magerman 1995 Charniak 1997 Collins 1999 Charniak 2000 Charniak 2001 . However .

TÀI LIỆU LIÊN QUAN