tailieunhanh - Báo cáo khoa học: "Automatic Acquisition of English Topic Signatures Based on a Second Language"

We present a novel approach for automatically acquiring English topic signatures. Given a particular concept, or word sense, a topic signature is a set of words that tend to co-occur with it. Topic signatures can be useful in a number of Natural Language Processing (NLP) applications, such as Word Sense Disambiguation (WSD) and Text Summarisation. | Automatic Acquisition of English Topic Signatures Based on a Second Language Xinglong Wang Department of Informatics University of Sussex Brighton BN1 9QH UK xw20@ Abstract We present a novel approach for automatically acquiring English topic signatures. Given a particular concept or word sense a topic signature is a set of words that tend to co-occur with it. Topic signatures can be useful in a number of Natural Language Processing NLP applications such as Word Sense Disambiguation WSD and Text Summarisation. Our method takes advantage of the different way in which word senses are lexicalised in English and Chinese and also exploits the large amount of Chinese text available in corpora and on the Web. We evaluated the topic signatures on a WSD task where we trained a second-order vector cooccurrence algorithm on standard WSD datasets with promising results. 1 Introduction Lexical knowledge is crucial for many NLP tasks. Huge efforts and investments have been made to build repositories with different types of knowledge. Many of them have proved useful such as WordNet Miller et al. 1990 . However in some areas such as WSD manually created knowledge bases seem never to satisfy the huge requirement by supervised machine learning systems. This is the so-called knowledge acquisition bottleneck. As an alternative automatic or semi-automatic acquisition methods have been proposed to tackle the bottleneck. For example Agirre et al. 2001 tried to automatically extract topic signatures by querying a search engine using monosemous synonyms or other knowledge associated with a concept defined in WordNet. The Web provides further ways of overcoming the bottleneck. Mihalcea et al. 1999 presented a method enabling automatic acquisition of sense-tagged corpora based on WordNet and an Internet search engine. Chklovski and Mihalcea 2002 presented another interesting proposal which turns to Web users to produce sense-tagged corpora. Another type of method which exploits .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.