Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment"

Bảo Hiển 67 6 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difﬁcult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. | Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment Marine Carpuat Yuval Marton Nizar Habash Columbia University Center for Computational Learning Systems 475 Riverside Drive New York NY 10115 marine ymarton habash @ccls.columbia.edu Abstract We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation SMT . We show that post-verbal subject VS constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores even on a strong large-scale baseline and despite noisy parses. 1 Introduction Modern Standard Arabic MSA is a morpho-syntactically complex language with different phenomena from English a fact that raises many interesting issues for natural language processing and Arabic-to-English statistical machine translation SMT . While comprehensive Arabic preprocessing schemes have been widely adopted for handling Arabic morphology in SMT e.g. Sadat and Habash 2006 Zollmann et al. 2006 Lee 2004 syntactic issues have not received as much attention by comparison Green et al. 2009 Crego and Habash 2008 Habash 2007 . Arabic verbal constructions are particularly challenging since subjects can occur in pre-verbal SV post-verbal VS or pro-dropped null subject constructions. As a result training data for learning verbal construction translations is split between the different constructions and their patterns and complex reordering schemas are needed in order to translate them into primarily pre-verbal subject languages SVO such as English. These issues are particularly problematic in .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Improving Word Representations via Global Context and Multiple Word Prototypes"

Báo cáo khoa học: "Improving the IBM Alignment Models Using Variational Bayes"

Báo cáo khoa học: "Improving the Use of Pseudo-Words for Evaluating Selectional Preferences"

Báo cáo khoa học: "Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data"

Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation"

Báo cáo khoa học: "A new Approach to Improving Multilingual Summarization using a Genetic Algorithm"

Báo cáo khoa học: "Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages"

Báo cáo khoa học: "Improving Chinese Semantic Role Labeling with Rich Syntactic Features"

Báo cáo khoa học: "Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment"

Báo cáo khoa học: "Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features"