tailieunhanh - Báo cáo khoa học: "Compiling a Massive, Multilingual Dictionary via Probabilistic Inference"

Can we automatically compose a large set of Wiktionaries and translation dictionaries to yield a massive, multilingual dictionary whose coverage is substantially greater than that of any of its constituent dictionaries? The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a translation of A? The paper introduces a novel algorithm that solves this problem for 10,000,000 words in more than 1,000 languages. . | Compiling a Massive Multilingual Dictionary via Probabilistic Inference Mausam Stephen Soderland Oren Etzioni Daniel S. Weld Michael Skinner Jeff Bilmes University of Washington Seattle Google Seattle mausam soderlan etzioni weld bilmes @ mskinner@ Abstract Can we automatically compose a large set of Wiktionaries and translation dictionaries to yield a massive multilingual dictionary whose coverage is substantially greater than that of any of its constituent dictionaries The composition of multiple translation dictionaries leads to a transitive inference problem if word A translates to word B which in turn translates to word C what is the probability that C is a translation of A The paper introduces a novel algorithm that solves this problem for 10 000 000 words in more than 1 000 languages. The algorithm yields PanDicTIONARY a novel multilingual dictionary. PanDictionary contains more than four times as many translations than in the largest Wiktionary at precision and over 200 000 000 pairwise translations in over 200 000 language pairs at precision . 1 Introduction and Motivation in the era of globalization inter-lingual communication is becoming increasingly important. Although nearly 7 000 languages are in use today Gordon 2005 most language resources are mono-lingual or This paper investigates whether Wiktionaries and other translation dictionaries available over the Web can be automatically composed to yield a massive multilingual dictionary with superior coverage at comparable precision. We describe the automatic construction of a massive multilingual translation dictionary called 1 The English Wiktionary a lexical resource developed by volunteers over the Internet is one notable exception that contains translations of English words in about 500 languages. Figure 1 A fragment of the translation graph for two senses of the English word spring . Edges labeled 1 and 3 are for spring in the sense of a season and 2