tailieunhanh - Báo cáo khoa học: "Exploiting Aggregate Properties of Bilingual Dictionaries For Distinguishing Senses of English Words and Inducing English Sense Clusters"
We propose a novel method for inducing monolingual semantic hierarchies and sense clusters from numerous foreign-language-to-English bilingual dictionaries. The method exploits patterns of non-transitivity in translations across multiple languages. No complex or hierarchical structure is assumed or used in the input dictionaries: each is initially parsed into the “lowest common denominator” form, which is to say, a list of pairs of the form (foreign word, English word). | Exploiting Aggregate Properties of Bilingual Dictionaries For Distinguishing Senses of English Words and Inducing English Sense Clusters Charles SCHAFER and David YAROWSKY Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 uSa cschafer yarowsky @ Abstract We propose a novel method for inducing monolingual semantic hierarchies and sense clusters from numerous foreign-language-to-English bilingual dictionaries. The method exploits patterns of non-transitivity in translations across multiple languages. No complex or hierarchical structure is assumed or used in the input dictionaries each is initially parsed into the lowest common denominator form which is to say a list of pairs of the form foreign word English word . We then propose a monolingual synonymy measure derived from this aggregate resource which is used to derive multilingually-motivated sense hierarchies for monolingual English words with potential applications in word sense classification lexicography and statistical machine translation. 1 Introduction In this work we consider a learning resource comprising over 80 foreign-language-to-English bilingual dictionaries collected by downloading electronic dictionaries from the Internet and also scanning and running optical character recognition OCR software on paper dictionaries. Such a diverse parallel lexical data set has not to our knowledge previously been assembled and examined in its aggregate form as a lexical semantics training resource. We show that this aggregate data set admits of some surprising applications including discovery of synonymy relationships between words and automatic induction of high-quality hierarchical word sense clusterings for English. We perform and describe several experiments deriving synonyms and sense groupings from the aggregate bilingual dictionary and subsequently suggest some possible applications for the results. Finally we propose that sense
đang nạp các trang xem trước