tailieunhanh - Báo cáo khoa học: "ADOP Model for Semantic Interpretation*"
In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Tree-bank. If a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. . | A DOP Model for Semantic Interpretation Remko Bonnema Rens Bod and Remko Scha Institute for Logic Language and Computation University of Amsterdam Spuistraàt 134 1012 VB Amsterdam Abstract In data-oriented language processing an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis using corpora with syntactic annotations such as the Penn Tree-bank. If a corpus with semantically annotated sentences is used the same approach can also generate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. A data-oriented semantic interpretation algorithm was tested on two semantically annotated corpora the English ATIS corpus and the Dutch OVIS corpus. Experiments show an increase in semantic accuracy if larger corpus-fragments are taken into consideration. 1 Introduction Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance analyses of this utterance are constructed by combining fragments from the corpus the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. This work was partially supported by NWO the Netherlands Organization for Scientific Research Priority Programme Language and Speech Technology . For the syntactic dimension of language various instantiations of this data-oriented processing or DOP approach have been worked out . Bod 1992-1995 Charniak 1996 Tugwell 1995 Sima an et .
đang nạp các trang xem trước