tailieunhanh - Báo cáo khoa học: "Automatic Identification of Non-compositional Phrases"

Non-compositional expressions present a special challenge to NLP applications. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. Our method is based on the hypothesis that when a phrase is non-composition, its mutual information differs significantly from the mutual informations of phrases obtained by substituting one of the word in the phrase with a similar word. | Automatic Identification of Non-compositional Phrases Dekang Lin Department of Computer Science University of Manitoba Winnipeg Manitoba Canada R3T 2N2 an lindek@ UMIACS University of Maryland College Park Maryland 20742 lindek@ Abstract Non-compositional expressions present a special challenge to NLP applications. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. Our method is based on the hypothesis that when a phrase is non-composition its mutual information differs significantly from the mutual informations of phrases obtained by substituting one of the word in the phrase with a similar word. 1 Introduction Non-compositional expressions present a special challenge to NLP applications. In machine translation word-for-word translation of non-compositional expressions can result in very misleading sometimes laughable translations. In information retrieval expansion of words in a non-compositional expression can lead to dramatic decrease in precision without any gain in recall. Less obviously non-compositional expressions need to be treated differently than other phrases in many statistical or corpus-based NLP methods. For example an underlying assumption in some word sense disambiguation systems . Da-gan and Itai 1994 Li et al. 1995 Lin 1997 is that if two words occurred in the same context they are probably similar. Suppose we want to determine the intended meaning of product in hot product . We can find other words that are also modified by hot . hot car and then choose the meaning of product that is most similar to meanings of these words. However this method fails when non-compositional expressions are involved. For instance using the same algorithm to determine the meaning of line in hot line the words product merchandise car etc. would lead the algorithm to choose the line of product sense of line . We present a method for automatic .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.