tailieunhanh - Báo cáo khoa học: "A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics"

Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information, web statistics, and inverse search based on Google to backward translate a Chinese named entity (NE) into English. Our system achieves a high Top-1 accuracy of , which is a relatively good performance reported in this area until present. | A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics Conrad Chen Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei Taiwan drchen@ hhchen@ Abstract Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information web statistics and inverse search based on Google to backward translate a Chinese named entity NE into English. Our system achieves a high Top-1 accuracy of which is a relatively good performance reported in this area until present. 1 Introduction Translation of named entities NE attracts much attention due to its practical applications in World Wide Web. The most challenging issue behind is the genres of NEs are various NEs are open vocabulary and their translations are very flexible. Some previous approaches use phonetic similarity to identify corresponding transliterations . translation by phonetic values Lin and Chen 2002 Lee and Chang 2003 . Some approaches combine lexical phonetic and meaning and semantic information to find corresponding translation of NEs in bilingual corpora Feng et al. 2004 Huang et al. 2004 Lam et al. 2004 . These studies focus on the alignment of NEs in parallel or comparable corpora. That is called close-ended NE translation. In open-ended NE translation an arbitrary NE is given and we want to find its corresponding translations. Most previous approaches exploit web search engine to help find translating candidates on the Internet. Al-Onaizan and Knight 2003 adopt language models to generate possible candidates first and then verify these candidates by web statistics. They achieve a Top- 1 accuracy of about with Arabic-to-English translation. Lu et al. 2004 use statistics of anchor texts in web search result to identify translation and obtain a Top-1 accuracy of about in .

TÀI LIỆU LIÊN QUAN