Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "A Comparison of Merging Strategies for Translation of German Compounds"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

In this article, compound processing for translation into German in a factored statistical MT system is investigated. Compounds are handled by splitting them prior to training, and merging the parts after translation. I have explored eight merging strategies using different combinations of external knowledge sources, such as word lists, and internal sources that are carried through the translation process, such as symbols or parts-of-speech. I show that for merging to be successful, some internal knowledge source is needed. I also show that an extra sequence model for part-ofspeech is useful in order to improve the order of compound parts. | A Comparison of Merging Strategies for Translation of German Compounds Sara Stymne Department of Computer and Information Science Linkoping University Sweden sarst@ida.liu.se Abstract In this article compound processing for translation into German in a factored statistical MT system is investigated. Compounds are handled by splitting them prior to training and merging the parts after translation. I have explored eight merging strategies using different combinations of external knowledge sources such as word lists and internal sources that are carried through the translation process such as symbols or parts-of-speech. I show that for merging to be successful some internal knowledge source is needed. I also show that an extra sequence model for part-of-speech is useful in order to improve the order of compound parts in the output. The best merging results are achieved by a matching scheme for part-of-speech tags. 1 Introduction In German as in many other languages compounds are normally written as single words without spaces or other word boundaries. Compounds can be binary i.e. made up of two parts 1a or have more parts 1b . There are also coordinated compound constructions 1c . In a few cases compounds are written with a hyphen 1d often when one of the parts is a proper name or an abbreviation. 1 a. Regierungskonferenz intergovernmental conference b. Fremdsprachenkenntnisse knowledge of foreign languages c. See- und Binnenhafen sea and inland ports d. Kosovo-Konflikt Kosovo conflict e. Volkermord genocide German compounds can have English translations that are compounds written as separate words 1a other constructions possibly with inserted function words and reordering 1b or single words 1e . Compound parts sometimes have special compound forms formed by addition or truncations of letters by umlaut or by a combination of these as in 1a where the letter -s is added to the first part Regierung. For an overview of German compound forms see Langer 1998 . Compounds are