tailieunhanh - Báo cáo khoa học: "Collocation Translation Acquisition Using Monolingual Corpora"

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. | Collocation Translation Acquisition Using Monolingual Corpora Yajuan LU Microsoft Research Asia 5F Sigma Center No. 49 Zhichun Road Haidian District Beijing China 100080 t-yjlv@ Abstract Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First dependency triples are extracted from Chinese and English corpora with dependency parsers. Then a dependency triple translation model is estimated using the EM algorithm based on a dependency correspondence assumption. The generated triple translation model is used to extract collocation translations from two monolingual corpora. Experiments show that our approach outperforms the existing monolingual corpus based methods in dependency triple translation and achieves promising results in collocation translation extraction. 1 Introduction A collocation is an arbitrary and recurrent word combination Benson 1990 . Previous work in collocation acquisition varies in the kinds of collocations they detect. These range from two-word to multi-word with or without syntactic structure Smadja 1993 Lin 1998 Pearce 2001 Seretan et al. 2003 . In this paper a collocation refers to a recurrent word pair linked with a certain syntactic relation. For instance solve verb-object problem is a collocation with a syntactic relation verb-object. Translation of collocations is difficult for nonnative speakers. Many collocation translations are idiosyncratic in the sense that they are unpredictable by syntactic or semantic features. Consider Chinese to English translation. The translations of W can be solve or resolve . The translations of H can be problem or issue . However translations of the collocation W H as solve-problemT or resolveissue is preferred over solve-issue or resolve Ming ZHOU Microsoft Research Asia