tailieunhanh - Báo cáo khoa học: "On Jointly Recognizing and Aligning Bilingual Named Entities"

We observe that (1) how a given named entity (NE) is translated (., either semantically or phonetically) depends greatly on its associated entity type, and (2) entities within an aligned pair should share the same type. Also, (3) those initially detected NEs are anchors, whose information should be used to give certainty scores when selecting candidates. From this basis, an integrated model is thus proposed in this paper to jointly identify and align bilingual named entities between Chinese and English. . | On Jointly Recognizing and Aligning Bilingual Named Entities Yufeng Chen Chengqing Zong Keh-Yih Su Institute of Automation Chinese Academy of Sciences Behavior Design Corporation Beijing China Hsinchu Taiwan . chenyf cqzong @ Abstract We observe that 1 how a given named entity NE is translated . either semantically or phonetically depends greatly on its associated entity type and 2 entities within an aligned pair should share the same type. Also 3 those initially detected NEs are anchors whose information should be used to give certainty scores when selecting candidates. From this basis an integrated model is thus proposed in this paper to jointly identify and align bilingual named entities between Chinese and English. It adopts a new mapping type ratio feature which is the proportion of NE internal tokens that are semantically translated enforces an entity type consistency constraint and utilizes additional monolingual candidate certainty factors based on those NE anchors . The experiments show that this novel approach has substantially raised the type-sensitive F-score of identified NE-pairs from to F-score imperfection reduction in our Chinese-English NE alignment task. 1 Introduction In trans-lingual language processing tasks such as machine translation and cross-lingual information retrieval named entity NE translation is essential. Bilingual NE alignment which links source NEs and target NEs is the first step to train the NE translation model. Since NE alignment can only be conducted after its associated NEs have first been identified the including-rate of the first recognition stage significantly limits the final alignment performance. To alleviate the above error accumulation problem two strategies have been proposed in the literature. The first strategy Al-Onaizan and Knight 2002 Moore 2003 Feng et al. 2004 Lee et al. 2006 identifies NEs only on the source side and then finds their corresponding NEs on

TỪ KHÓA LIÊN QUAN