tailieunhanh - Báo cáo khoa học: "Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations"
Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. | Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations Emily Pitler Computer and Information Science University of Pennsylvania Philadelphia PA 19104 epitler@ Abstract Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers these two categories have the lowest accuracies and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions we achieve a new state-of-the-art for English dependencies with correct attachments on the current standard. Furthermore conjunctions are attached with an accuracy of and prepositions with an accuracy of . 1 Introduction Prepositions and conjunctions are two large remaining bottlenecks in parsing. Across various existing parsers these two categories have the lowest accuracies and mistakes made on these have consequences for downstream applications. Machine translation is sensitive to parsing errors involving prepositions and conjunctions because in some languages different attachment decisions in the parse of the source language sentence produce different translations. Preposition attachment mistakes are particularly bad when translating into Japanese Schwartz et al. 2003 which uses a different postposition for different attachments conjunction mis 768 takes can cause word ordering mistakes when translating into Chinese Huang 1983 . Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution Jurafsky and Martin 2008 . However lexical statistics based on the training set only are typically sparse and have only a small
đang nạp các trang xem trước