tailieunhanh - Báo cáo khoa học: "Linguistic Knowledge Acquisition from Parsing Failures"

A semi-automatic procedure of linguistic knowledge acquisition is proposed, which combines corpus-based techniques with the conventional rule-based approach. The rule-based component generates all the possible hypotheses of defects which the existing linguistic knowledge might contain, when it fails to parse a sentence. The rule-based component does not try to identify the defects, but generates a set of hypotheses and the corpus-based component chooses the plausible ones among them. | Linguistic Knowledge Acquisition from Parsing Failures Masaki KIYONO and Jun-ichi TSUJII kiyono@ and tsujii@ Centre for Computational Linguistics University of Manchester Institute of Science and Technology PO Box 88 Manchester M60 1QD United Kingdom Abstract A semi-automatic procedure of linguistic knowledge acquisition is proposed which combines corpus-based techniques with the conventional rule-based approach. The rule-based component generates all the possible hypotheses of defects which the existing linguistic knowledge might contain when it fails to parse a sentence. The rule-based component does not try to identify the defects but generates a set of hypotheses and the corpus-based component chooses the plausible ones among them. The procedure will be used for adapting or re-using existing linguistic resources for new application domains. 1 Introduction While quite a number of useful grammar formalisms for natural language processing now exist it still remains a time-consuming and hard task to develop grammars and dictionaries with comprehensive coverage. It is also the case that though quite a few computational grammars and dictionaries with comprehensive coverage have been used in various application systems to re-use them for other application domains is not always so easy even if we use the same formalisms and programs such as parsers etc. We usually have to revise add and delete grammar rules and lexical entries in order to adapt them to the peculiarities of languages sublanguages of new application domains iSekine et al. 1992 Tsujii et al. 1992 Ananiadou 1990 . also a staff member of Matsushita Electric Industrial Co. Ltd. Tokyo JAPAN. Such adaptations of existing linguistic knowledge to a new domain are currently performed through rather undisciplined trial and error processes involving much human effort. In this paper we show that techniques similar to those in robust parsing of ill-formed input together with corpus-based

TỪ KHÓA LIÊN QUAN