tailieunhanh - Báo cáo khoa học: "Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora"

Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated. | Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora Zhifei Li and David Yarowsky Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 USA and yarowsky@ Abstract Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations which are mostly acronyms and truncations the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations many of them may not appear in available parallel corpora in which case current machine translation systems simply treat them as unknown words and leave them untranslated. In this paper we present a novel unsupervised method that automatically extracts the relation between a full-form phrase and its abbreviation from monolingual corpora and induces translation entries for the abbreviation by using its full-form as a bridge. Our method does not require any additional annotated data other than the data that a regular translation system uses. We integrate our method into a state-of-the-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets. 1 Introduction The modern Chinese language is a highly abbreviated one due to the mixed use of ancient singlecharacter words with modern multi-character words and compound words. According to Chang and Lai 2004 approximately 20 of sentences in a typical news article have abbreviated words in them. Abbreviations have become even more popular along with the development of Internet media . online chat weblog newsgroup and so on . While English words are normally abbreviated by either their Full-form Abbreviation Translation Hong Kong Governor Security Council Figure 1 Chinese Abbreviations Examples first letters . acronyms or via truncation the formation of Chinese abbreviations is much more complex. Figure 1 .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.