tailieunhanh - Báo cáo khoa học: "Translation and Extension of Concepts Across Languages"
We present a method which, given a few words defining a concept in some language, retrieves, disambiguates and extends corresponding terms that define a similar concept in another specified language. This can be very useful for cross-lingual information retrieval and the preparation of multi-lingual lexical resources. We automatically obtain term translations from multilingual dictionaries and disambiguate them using web counts. We then retrieve web snippets with cooccurring translations, and discover additional concept terms from these snippets | Translation and Extension of Concepts Across Languages Dmitry Davidov iCnc The Hebrew University of Jerusalem dmitry@ Ari Rappoport Institute of Computer Science The Hebrew University of Jerusalem arir@ Abstract We present a method which given a few words defining a concept in some language retrieves disambiguates and extends corresponding terms that define a similar concept in another specified language. This can be very useful for cross-lingual information retrieval and the preparation of multi-lingual lexical resources. We automatically obtain term translations from multilingual dictionaries and disambiguate them using web counts. We then retrieve web snippets with cooccurring translations and discover additional concept terms from these snippets. Our term discovery is based on coappearance of similar words in symmetric patterns. We evaluate our method on a set of language pairs involving 45 languages including combinations of very dissimilar ones such as Russian Chinese and Hebrew for various concepts. We assess the quality of the retrieved sets using both human judgments and automatically comparing the obtained categories to corresponding English WordNet synsets. 1 Introduction Numerous NLP tasks utilize lexical databases that incorporate concepts or word categories sets of terms that share a significant aspect of their meanings . terms denoting types of food tool names etc . These sets are useful by themselves for improvement of thesauri and dictionaries and they are also utilized in various applications including textual entailment and question answering. Manual development of lexical databases is labor intensive error prone and susceptible to arbitrary human decisions. While databases like WordNet WN are invaluable for NLP for some applications any offline resource would not be extensive enough. Frequently an application requires data on some very specific topic or on very recent news-related events. In these cases even .
đang nạp các trang xem trước