tailieunhanh - Báo cáo khoa học: "Named Entity Transliteration with Comparable Corpora"

In this paper we investigate ChineseEnglish name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics — and therefore share references to named entities — but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs | Named Entity Transliteration with Comparable Corpora Richard Sproat Tao Tao ChengXiang Zhai University of Illinois at Urbana-Champaign Urbana IL 61801 rws@ taotao czhai @ Abstract In this paper we investigate Chinese-English name transliteration using comparable corpora corpora where texts in the two languages deal in some of the same topics and therefore share references to named entities but are not translations of each other. We present two distinct methods for transliteration one approach using phonetic transliteration and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step. 1 Introduction As part of a more general project on multilingual named entity identification we are interested in the problem of name transliteration across languages that use different scripts. One particular issue is the discovery of named entities in comparable texts in multiple languages where by comparable we mean texts that are about the same topic but are not in general translations of each other. For example if one were to go through an English Chinese and Arabic newspaper on the same day it is likely that the more important international events in various topics such as politics business science and sports would each be covered in each of the newspapers. Names of the same persons locations and so forth which are often transliterated rather than translated would be found in comparable stories across the three We wish to use this expectation to leverage transliteration and thus the identification of named entities across languages. Our idea is that the occurrence of a cluster of names in say an English text should

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.