tailieunhanh - Báo cáo khoa học: "HPSG-Style Underspecified Japanese Grammar with Wide Coverage"

This paper describes a wide-coverage Japanese grammar based on HPSG. The aim of this work is to see the coverage and accuracy attainable using an underspecified grammar. Underspecification, allowed in a typed feature structure formalism, enables us to write down a wide-coverage grammar concisely. The grammar we have implemented consists of only 6 ID schemata, 68 lexical entries (assigned to functional words), and 63 lexical entry templates (assigned to parts of speech ( B O S s ) ) . | HPSG-Style Underspecified Japanese Grammar with Wide Coverage MITSUISHI Yutakat TORISAWA Kentarot TSUJII Jun ichfi Department of Information Science Graduate School of Science University of Tokyo CCL UMIST . Abstract This paper describes a wide-coverage Japanese grammar based on HPSG. The aim of this work is to see the coverage and accuracy attainable using an underspecified grammar. Underspecification allowed in a typed feature structure formalism enables US to write down a wide-coverage grammar concisely. The grammar we have implemented consists of only 6 ID schemata 68 lexical entries assigned to functional words and 63 lexical entry templates assigned to parts of speech POSs . Furthermore. word-specific constraints such as subcategorization of verbs are not fixed in the grammar. However this grammar can generate parse trees for 87 of the 10000 sentences ill the Japanese EDR corpus. The dependency accuracy is 78 when a parser uses the heuristic that every bunsetsit 1 is attached to the nearest possible one. 1 Introduction Our purpose is to design a practical Japanese grammar based on HPSG Head-driven Phrase Structure Grammar Pollard and Sag 1994 with wide coverage ana reasonable accuracy for syntactic structures of real-world texts. In this paper coverage refers to the percentage of input sentences for which the grammar returns at least one parse tree and accuracy refers to the percentage of bunsetsus which are attached correctly. To realize wide coverage and reasonable accuracy the following steps had been taken A At first we prepared a linguistically valid but coarse grammar with wide coverage. B We then refined the grammar in regard to accuracy using practical heuristics which are not linguistically motivated. As for A the first grammar we have constructed actually consists of only 68 lexical en This research is partially founded by the project of JSPS JSPS-RFTF96P00502 . 1A bunsetsu is a common unit when syntactic structures in Japanese are discussed. .

TÀI LIỆU LIÊN QUAN