tailieunhanh - Báo cáo khoa học: "Combining Source and Target Language Information for Name Tagging of Machine Translation Output"
A Named Entity Recognizer (NER) generally has worse performance on machine translated text, because of the poor syntax of the MT output and other errors in the translation. As some tagging distinctions are clearer in the source, and some in the target, we tried to integrate the tag information from both source and target to improve target language tagging performance, especially recall. | Combining Source and Target Language Information for Name Tagging of Machine Translation Output Shasha Liao New York University 715 Broadway 7th floor New York NY 10003 USA liaoss@ Abstract A Named Entity Recognizer NER generally has worse performance on machine translated text because of the poor syntax of the MT output and other errors in the translation. As some tagging distinctions are clearer in the source and some in the target we tried to integrate the tag information from both source and target to improve target language tagging performance especially recall. In our experiments with Chinese-to-English MT output we first used a simple merge of the outputs from an ET Entity Translation system and an English NER system getting an absolute gain of in F-measure from to . We then trained an MEMM module to integrate them more discriminatively and got a further average gain of in F-measure from to . 1 Introduction Because of the growing multilingual environment for NLP there is an increasing need to be able to annotate and analyze the output of machine translation MT systems. But treating this task as one of processing ordinary text can lead to poor results. We examine this problem with respect to the name tagging of English text. A Named Entity Recognizer NER trained on an English corpus does not have the same performance when applied to machine-translated text. From our experiments on NIST 05 Chinese-to-English MT evaluation data when we used the same English NER to tag the reference translation and the MT output the F-measure was for the reference but only for the MT output. There are two primary reasons for this. First the performance of current translation systems is not very good and so the output is quite different from Standard English text. The fluency of the translated text will be poor and the context of a named entity may be weird. Second the translated text has some foreign names which are hard .
đang nạp các trang xem trước