tailieunhanh - Báo cáo khoa học: "Alignment Model Adaptation for Domain-Specific Word Alignment"

This paper proposes an alignment adaptation approach to improve domain-specific (in-domain) word alignment. The basic idea of alignment adaptation is to use out-of-domain corpus to improve in-domain word alignment results. In this paper, we first train two statistical word alignment models with the large-scale out-of-domain corpus and the small-scale in-domain corpus respectively, and then interpolate these two models to improve the domain-specific word alignment. Experimental results show that our approach improves domain-specific word alignment in terms of both precision and recall, achieving a relative error rate reduction of as compared with the state-of-the-art technologies. . | Alignment Model Adaptation for Domain-Specific Word Alignment WU Hua WANG Haifeng LIU Zhanyi Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing 100738 China wuhua wanghaifeng liuzhanyi @ Abstract This paper proposes an alignment adaptation approach to improve domain-specific in-domain word alignment. The basic idea of alignment adaptation is to use out-of-domain corpus to improve in-domain word alignment results. In this paper we first train two statistical word alignment models with the large-scale out-of-domain corpus and the small-scale in-domain corpus respectively and then interpolate these two models to improve the domain-specific word alignment. Experimental results show that our approach improves domain-specific word alignment in terms of both precision and recall achieving a relative error rate reduction of as compared with the state-of-the-art technologies. 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation Brown et al. 1993 . In recent years many researchers have employed statistical models Wu 1997 Och and Ney 2003 Cherry and Lin 2003 or association measures Smadja et al. 1996 Ahrenberg et al. 1998 Tufis and Barbu 2002 to build alignment links. In order to achieve satisfactory results all of these methods require a large-scale bilingual corpus for training. When the large-scale bilingual corpus is not available some researchers use existing dictionaries to improve word alignment Ker and Chang 1997 . However only a few studies Wu and Wang 2004 directly address the problem of domain-specific word alignment when neither the large-scale domain-specific bilingual corpus nor the domain-specific translation dictionary is available. In this paper we address the problem of word alignment in a specific domain in which only a small-scale corpus is available. In the domain-specific in-domain corpus there are .