tailieunhanh - Báo cáo khoa học: "Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing"

The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. | Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing Matthieu Constant Universite Paris-Est LIGM CNRS France mconstan@ Anthony Sigogne Universite Paris-Est LIGM CNRS France sigogne@ Patrick Watrin Universite de Louvain CENTAL Belgium @ Abstract The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly we show that pregrouping multiword expressions before parsing with a state-of-the-art recognizer improves multiword recognition accuracy and unlabeled attachment score. However it has no statistically significant impact in terms of F-score as incorrect multiword expression recognition has important side effects on parsing. Secondly integrating multiword expressions in the parser grammar followed by a reranker specific to such expressions slightly improves all evaluation metrics. 1 Introduction The integration of Multiword Expressions MWE in real-life applications is crucial because such expressions have the particularity of having a certain level of idiomaticity. They form complex lexical units which if they are considered should significantly help parsing. From a theoretical point of view the integration of multiword expressions in the parsing procedure has been studied for different formalisms Head-Driven Phrase Structure Grammar Copestake et al. 2002 Tree Adjoining Grammars Schuler and Joshi 2011 etc. From an empirical point of 204 view their incorporation has also been considered such as in Nivre and Nilsson 2004 for dependency parsing and in Arun and Keller 2005 in constituency parsing. Although experiments always relied on a corpus where the MWEs were perfectly

TỪ KHÓA LIÊN QUAN