tailieunhanh - Báo cáo khoa học: "Rich bitext projection features for parse reranking"

Many different types of features have been shown to improve accuracy in parse reranking. A class of features that thus far has not been considered is based on a projection of the syntactic structure of a translation of the text to be parsed. The intuition for using this type of bitext projection feature is that ambiguous structures in one language often correspond to unambiguous structures in another. We show that reranking based on bitext projection features increases parsing accuracy significantly. . | Rich bitext projection features for parse reranking Alexander Fraser Renjing Wang Hinrich Schutze Institute for Natural Language Processing University of Stuttgart fraser wangrg @ Abstract Many different types of features have been shown to improve accuracy in parse reranking. A class of features that thus far has not been considered is based on a projection of the syntactic structure of a translation of the text to be parsed. The intuition for using this type of bitext projection feature is that ambiguous structures in one language often correspond to unambiguous structures in another. We show that reranking based on bitext projection features increases parsing accuracy significantly. 1 Introduction Parallel text or bitext is an important knowledge source for solving many problems such as machine translation cross-language information retrieval and the projection of linguistic resources from one language to another. In this paper we show that bitext-based features are effective in addressing another NLP problem increasing the accuracy of statistical parsing. We pursue this approach for a number of reasons. First one limiting factor for syntactic approaches to statistical machine translation is parse quality Quirk and Corston-Oliver 2006 . Improved parses of bitext should result in improved machine translation. Second as more and more texts are available in several languages it will be increasingly the case that a text to be parsed is itself part of a bitext. Third we hope that the improved parses of bitext will serve as higher quality training data for improving monolingual parsing using a process similar to self-training McClosky et al. 2006 . It is well known that different languages encode different types of grammatical information agreement case tense etc. and that what can be left unspecified in one language must be made explicit NP Figure 1 English parse with high attachment who had gray hair in another. This information can be used for .

TỪ KHÓA LIÊN QUAN