tailieunhanh - Báo cáo khoa học: "Automatic Construction of Frame Representations for Spont aneous Speech in Unrestricted Domains"

This paper presents a system which automatically generates shallow semantic frame structures for conversational speech in unrestricted domains. We argue that such shallow semantic representations can indeed be generated with a minimum amount of linguistic knowledge engineering and without having to explicitly construct a semantic knowledge base. | Automatic Construction of Frame Representations for Spontaneous Speech in Unrestricted Domains Klaus Zechner Language Technologies Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 USA Abstract This paper presents a system which automatically generates shallow semantic frame structures for conversational speech in unrestricted domains. We argue that such shallow semantic representations can indeed be generated with a minimum amount of linguistic knowledge engineering and without having to explicitly construct a semantic knowledge base. The system is designed to be robust to deal with the problems of speech dysfluencies ungrammaticalities and imperfect speech recognition. Initial results on speech transcripts are promising in that correct mappings could be identified in 21 of the clauses of a test set resp. 44 of this test set where ungrammatical or verb-less clauses were removed . 1 Introduction In syntactic and semantic analysis of spontaneous speech little research has been done with regard to dealing with language in unrestricted domains. There are several reasons why so far an in-depth analysis of this type of language data has been considered prohibitively hard inherent properties of spontaneous speech such as dysfluencies and ungrammaticalities Lavie 1996 word accuracy being far from perfect . on a typical corpus such as Switchboard SWBD Godfrey et al. 1992 current state-of-the-art recognizers have word error rates in the range of 30-40 Finke et al. 1997 if the domain is unrestricted manual construction of a semantic knowledge base with reasonable coverage is very labor intensive In this paper we propose to combine methods of partial parsing chunking with the mapping of the verb arguments onto subcategorization frames that can be extracted automatically in this case from WordNet Miller et al. 1993 . As preliminary results indicate this yields a way of generating shallow semantic representations efficiently and

TÀI LIỆU MỚI ĐĂNG