tailieunhanh - Báo cáo khoa học: "Exploiting Heterogeneous Treebanks for Parsing"
We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks. First we propose to employ an iteratively trained target grammar parser to perform grammar formalism conversion, eliminating predefined heuristic rules as required in previous methods. | Exploiting Heterogeneous Treebanks for Parsing Zheng-Yu Niu Haifeng Wang Hua Wu Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza Beijing 100738 China niuzhengyu wanghaifeng wuhua @ Abstract We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems converting grammar formalisms of the treebanks to the same one and parsing on these homogeneous treebanks. First we propose to employ an iteratively trained target grammar parser to perform grammar formalism conversion eliminating predefined heuristic rules as required in previous methods. Then we provide two strategies to refine conversion results and adopt a corpus weighting technique for parsing on homogeneous treebanks. Results on the Penn Treebank show that our conversion method achieves 42 error reduction over the previous best result. Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to resulting in 6 error reduction over the previous best result. 1 Introduction The last few decades have seen the emergence of multiple treebanks annotated with different grammar formalisms motivated by the diversity of languages and linguistic theories which is crucial to the success of statistical parsing Abeille et al. 2000 Brants et al. 1999 Bohmova et al. 2003 Han et al. 2002 Kurohashi and Nagao 1998 Marcus et al. 1993 Moreno et al. 2003 Xue et al. 2005 . Availability of multiple treebanks creates a scenario where we have a treebank annotated with one grammar formalism and another treebank annotated with another grammar formalism that we are interested in. We call the first a source treebank and the second a target treebank. We thus encounter a problem of how to use these heterogeneous treebanks for target grammar parsing. Here heterogeneous treebanks refer to two or more treebanks with .
đang nạp các trang xem trước