tailieunhanh - Báo cáo khoa học: "Coordinate Structure Analysis with Global Structural Constraints and Alignment-Based Local Features"

We propose a hybrid approach to coordinate structure analysis that combines a simple grammar to ensure consistent global structure of coordinations in a sentence, and features based on sequence alignment to capture local symmetry of conjuncts. The weight of the alignmentbased features, which in turn determines the score of coordinate structures, is optimized by perceptron training on a given corpus. A bottom-up chart parsing algorithm efficiently finds the best scoring structure, taking both nested or nonoverlapping flat coordinations into account. . | Coordinate Structure Analysis with Global Structural Constraints and Alignment-Based Local Features Kazuo Hara Masashi Shimbo Hideharu Okuma Yuji Matsumoto Graduate School of Information Science Nara Institute of Science and Technology Ikoma Nara 630-0192 Japan kazuo-h shimbo hideharu-o matsu @ Abstract We propose a hybrid approach to coordinate structure analysis that combines a simple grammar to ensure consistent global structure of coordinations in a sentence and features based on sequence alignment to capture local symmetry of conjuncts. The weight of the alignmentbased features which in turn determines the score of coordinate structures is optimized by perceptron training on a given corpus. A bottom-up chart parsing algorithm efficiently finds the best scoring structure taking both nested or nonoverlapping flat coordinations into account. We demonstrate that our approach outperforms existing parsers in coordination scope detection on the Genia corpus. 1 Introduction Coordinate structures are common in life science literature. In Genia Treebank Beta Kim et al. 2003 the number of coordinate structures is nearly equal to that of sentences. In clinical papers the outcome of clinical trials is typically described with coordination as in Median times to progression and median survival times were months and months in arm A and months and months in arm B. Schuette et al. 2006 Despite the frequency and implied importance of coordinate structures coordination disambiguation remains a difficult problem even for state-of-the-art parsers. Figure 1 a shows the coordinate structure extracted from the output of Charniak and Johnson s 2005 parser on the above example. This is somewhat surprising given that the symmetry of conjuncts in the sentence is obvious to human eyes and its correct coordinate structure shown in Figure 1 b can be readily observed. Figure 1 a Output from the Charniak-Johnson parser and b the correct coordinate structure. .

TÀI LIỆU LIÊN QUAN