tailieunhanh - Báo cáo khoa học: "Interactive Word Alignment for Language Engineering"
In this paper we report ongoing work on developing an interactive word alignment environment that will assist a user to quickly produce accurate full-coverage word alignment in bitexts for different language engineering tasks, such as MT lexicons and gold standards for evaluation. The system uses a graphical interface, static and dynamic resources as well as machine learning techniques. We also sketch how the system is being integrated with an automatic word aligner. | Interactive Word Alignment for Language Engineering Lars Ahrenberg Magnus Merkel Michael Petterstedt Department of Computer and Information Science Linkoping University lah magme g-micpe @ Abstract In this paper we report ongoing work on developing an interactive word alignment environment that will assist a user to quickly produce accurate full-coverage word alignment in bitexts for different language engineering tasks such as MT lexicons and gold standards for evaluation. The system uses a graphical interface static and dynamic resources as well as machine learning techniques. We also sketch how the system is being integrated with an automatic word aligner. 1 Introduction Automatic word alignment systems have proved to be useful tools for various language and NLP tasks such as bilingual lexicon extraction for lexicography bilingual terminology and machine translation. Although performance is improving precision in the range from 80 to 95 per cent recall slightly above 50 there are applications such as translation and the creation of gold standards where these figures are not good enough. Furthermore since most automatic systems rely on co-occurrence rare correspondences go unnoticed even though they may be relevant for applications such as terminology or lexicography. This means that even for these applications higher recall and precision will give better effect. For machine translation errors in alignment are likely to cause errors in translation. Thus either there will have to be a reviewing process when a generated bilingual dictionary is to be part of a MT system or else the reviewing could be made in the underlying files . by interactive reviewing. Extending the application area to more linguistic fields such as translation studies or any form of corpus-based linguistics errors are of course a curse. Also observations and generalisations would be better grounded if they are complete . all instances of the phenomena of interest have been found
đang nạp các trang xem trước