tailieunhanh - Báo cáo khoa học: "Better Automatic Treebank Conversion Using A Feature-Based Approach"

For the task of automatic treebank conversion, this paper presents a feature-based approach which encodes bracketing structures in a treebank into features to guide the conversion of this treebank to a different standard. Experiments on two Chinese treebanks show that our approach improves conversion accuracy by over a strong baseline. | Better Automatic Treebank Conversion Using A Feature-Based Approach Muhua Zhu Jingbo Zhu Minghan Hu Natural Language Processing Lab. Northeastern University China zhumuhua@ zhujingbo@ huminghan@ Abstract For the task of automatic treebank conversion this paper presents a feature-based approach which encodes bracketing structures in a treebank into features to guide the conversion of this treebank to a different standard. Experiments on two Chinese treebanks show that our approach improves conversion accuracy by over a strong baseline. 1 Introduction In the field of syntactic parsing research efforts have been put onto the task of automatic conversion of a treebank source treebank to fit a different standard which is exhibited by another treebank target treebank . Treebank conversion is desirable primarily because source-style and target-style annotations exist for non-overlapping text samples so that a larger target-style treebank can be obtained through such conversion. Hereafter source and target treebanks are named as heterogenous treebanks due to their different annotation standards. In this paper we focus on the scenario of conversion between phrase-structure heterogeneous treebanks Wang et al. 1994 Zhu and Zhu 2010 . Due to the availability of annotation in a source treebank it is natural to use such annotation to guide treebank conversion. The motivating idea is illustrated in Fig. 1 which depicts a sentence annotated with standards of Tsinghua Chinese Treebank TCT Zhou 1996 and Penn Chinese Treebank CTB Xue et al. 2002 respectively. Suppose that the conversion is in the direction from the TCT-style parse left side to the CTB-style parse right side . The constituents vp W will i surrender dj A A enemy will i surrender and np lt 715 intelligence experts in the TCT-style parse strongly suggest a resulting CTB-style parse also bracket the words as constituents. Zhu and Zhu 2010 show the effectiveness of using .

TỪ KHÓA LIÊN QUAN