tailieunhanh - Báo cáo khoa học: "Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents"

We address a core aspect of the multilingual content synchronization task: the identification of novel, more informative or semantically equivalent pieces of information in two documents about the same topic. This can be seen as an application-oriented variant of textual entailment recognition where: i) T and H are in different languages, and ii) entailment relations between T and H have to be checked in both directions. | Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents Yashar Mehdad Matteo Negri Marcello Federico Fondazione Bruno Kessler FBK-irst Trento Italy mehdad negri federico @ Abstract We address a core aspect of the multilingual content synchronization task the identification of novel more informative or semantically equivalent pieces of information in two documents about the same topic. This can be seen as an application-oriented variant of textual entailment recognition where i T and H are in different languages and ii entailment relations between T and H have to be checked in both directions. Using a combination of lexical syntactic and semantic features to train a cross-lingual textual entailment system we report promising results on different datasets. 1 Introduction Given two documents about the same topic written in different languages . Wiki pages content synchronization deals with the problem of automatically detecting and resolving differences in the information they provide in order to produce aligned mutually enriched versions. A roadmap towards the solution of this problem has to take into account among the many sub-tasks the identification of information in one page that is semantically equivalent novel or more informative with respect to the content of the other page. In this paper we set such problem as an application-oriented crosslingual variant of the Textual Entailment TE recognition task Dagan and Glickman 2004 . Along this direction we make two main contributions a Experiments with multi-directional crosslingual textual entailment. So far cross-lingual 120 textual entailment CLTE has been only applied to i available TE datasets uni-directional relations between monolingual pairs transformed into their cross-lingual counterpart by translating the hypotheses into other languages Negri and Mehdad 2010 and ii machine translation MT evaluation datasets Mehdad et al. 2012 . Instead we experiment with the only corpus

TỪ KHÓA LIÊN QUAN