tailieunhanh - Báo cáo khoa học: "Robust Conversion of CCG Derivations to Phrase Structure Trees"
We propose an improved, bottom-up method for converting CCG derivations into PTB-style phrase structure trees. In contrast with past work (Clark and Curran, 2009), which used simple transductions on category pairs, our approach uses richer transductions attached to single categories. Our conversion preserves more sentences under round-trip conversion ( vs. ) and is more robust. | Robust Conversion of CCG Derivations to Phrase Structure Trees Jonathan K. Kummerfeld Computer Science Division University of California Berkeley Berkeley CA 94720 USA jkk klein @ Dan Klein James R. Curran Ỉ0-lab School of IT University of Sydney Sydney NSW 2006 Australia j ames@ Abstract We propose an improved bottom-up method for converting CCG derivations into PTB-style phrase structure trees. In contrast with past work Clark and Curran 2009 which used simple transductions on category pairs our approach uses richer transductions attached to single categories. Our conversion preserves more sentences under round-trip conversion vs. and is more robust. In particular unlike past methods ours does not require ad-hoc rules over non-local features and so can be easily integrated into a parser. 1 Introduction Converting the Penn Treebank PTB Marcus et al. 1993 to other formalisms such as HPSG Miyao et al. 2004 LFG Cahill et al. 2008 LTAG Xia 1999 and CCG Hockenmaier 2003 is a complex process that renders linguistic phenomena in formalism-specific ways. Tools for reversing these conversions are desirable for downstream parser use and parser comparison. However reversing conversions is difficult as corpus conversions may lose information or smooth over PTB inconsistencies. Clark and Curran 2009 developed a CCG to PTB conversion that treats the CCG derivation as a phrase structure tree and applies hand-crafted rules to every pair of categories that combine in the derivation. Because their approach does not exploit the generalisations inherent in the CCG formalism they must resort to ad-hoc rules over non-local features of the CCG constituents being combined when a fixed pair of CCG categories correspond to multiple PTB structures . Even with such rules they correctly convert only of gold CCGbank derivations. 105 Our conversion assigns a set of bracket instructions to each word based on its CCG category then follows the CCG .
đang nạp các trang xem trước