tailieunhanh - Báo cáo khoa học: "A Debug Tool for Practical Grammar Development"

We have developed willex, a tool that helps grammar developers to work efficiently by using annotated corpora and recording parsing errors. Willex has two major new functions. First, it decreases ambiguity of the parsing results by comparing them to an annotated corpus and removing wrong partial results both automatically and manually. Second, willex accumulates parsing errors as data for the developers to clarify the defects of the grammar statistically. We applied willex to a large-scale HPSG-style grammar as an example. . | A Debug Tool for Practical Grammar Development Akane Yakushijif Yuka Tateisii Yusuke Miyaof Naoki Yoshinagaf Jun ichi Tsujiifi fDepartment of Computer Science University of Tokyo Hongo 7-3-1 Bunkyo-ku Tokyo 113-0033 JAPAN 1CREST. JST Japan Science and Technology Corporation Honcho 4-1-8 Kawaguchi-shi Saitama 332-0012 JAPAN akane yucca yusuke yoshinag tsujii @ Abstract We have developed willex a tool that helps grammar developers to work efficiently by using annotated corpora and recording parsing errors. Willex has two major new functions. First it decreases ambiguity of the parsing results by comparing them to an annotated corpus and removing wrong partial results both automatically and manually. Second willex accumulates parsing errors as data for the developers to clarify the defects of the grammar statistically. We applied willex to a large-scale HPSG-style grammar as an example. 1 Introduction There is an increasing need for syntactical parsers for practical usages such as information extraction. For example Yakushiji et al. 2001 extracted argument structures from biomedical papers using a parser based on XHPSG Tateisi et al. 1998 which is a large-scale HPSG. Although large-scale and general-purpose grammars have been developed they have a problem of limited coverage. The limits are derived from deficiencies of grammars themselves. For example XHPSG cannot treat coordinations of verbs ex. Molybdate slowed but did not prevent the conversion. nor reduced relatives ex. Rb mutants derived from patients with retinoblastoma. . Finding these grammar defects and modifying them require tremendous human effort. Hence we have developed willex that helps to improve the general-purpose grammars. Willex has two major functions. First it reduces a human workload to improve the general-purpose grammar through using language intuition encoded in syntactically tagged corpora in XML format. Second it records data of grammar defects to allow developers to have a