tailieunhanh - Báo cáo khoa học: "AUTOMATICALLY EXTRACTING AND REPRESENTING COLLOCATIONS FOR LANGUAGE GENERATION*"

Collocational knowledge is necessary for language generation. The problem is that collocations come in a large variety of forms. They can involve two, three or more words, these words can be of different syntactic categories and they can be involved in more or less rigid ways. This leads to two main difficulties: collocational knowledge has to be acquired and it must be represented flexibly so that it can be used for language generation. We address both problems in this paper, focusing on the acquisition problem. We describe a program, X t r a c t , that automatically acquires. | AUTOMATICALLY EXTRACTING AND REPRESENTING COLLOCATIONS FOR LANGUAGE GENERATION Frank A. Smadja and Kathleen R. McKeown Department of Computer Science Columbia University-New York NY 10027 ABSTRACT Collocational knowledge is necessary for language generation. The problem is that collocations come in a large variety of forms. They can involve two three or more words these words can be of different syntactic categories and they can be involved in more or less rigid ways. This leads to two main difficulties collocational knowledge has to be acquired and it must be represented flexibly so that it can be used for language generation. We address both problems in this paper focusing on the acquisition problem. We describe a program Xtract that automatically acquires a range of collocations from large textual corpora and we describe how they can be represented in a flexible lexicon using a unification based formalism. 1 INTRODUCTION Language generation research on lexical choice has focused on syntactic and semantic constraints on word choice and word ordering. Collocational constraints however also play a role in how words can co-occur in the same sentence. Often the use of one word in a particular context of meaning will require the use of one or more other words in the same sentence. While phrasal lexicons in which lexical associations are pre-encoded . Kukich 83 Jacobs 85 Danlos 87 allow for the treatment of certain types of collocations they also have problems. Phrasal entries must be compiled by hand which is both expensive and incomplete. Furthermore phrasal entries tend to capture rather rigid idiomatic expressions. In contrast collocations vary tremendously in the number of words involved in the syntactic categories of the words in the syntactic relations between the words and in how rigidly the individual words are used together. For example in some cases the words of a collocation must be adjacent while in others they can be separated by a varying number of .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.