tailieunhanh - Paraphrasing and Translation - part 2
Tôi không tin rằng trong cơ thể cắt xén chết không có đậu nành partidaria de mutilar El mar arroja tantos Vì vậy, nhiều cadáveres cadáveres xác chết de inmigrantes ilegales ahogados a la playa . bất hợp pháp bị chết đuối trôi dạt vào bãi biển . | 2 Chapter 1. Introduction I do not believe in mutilating dead bodies no soy partidaria de mutilar cadáveres El mar arroja tantos cadáveres de inmigrantes ilegales ahogados a la playa. So many corpses of drowned illegals get washed up on beaches. Figure The Spanish word cadaveres can be used to discover that the English phrase dead bodies can be paraphrased as corpses. different encyclopedias articles about the same topic. Since they are written by different authors items in these corpora represent a natural source for paraphrases - they express the same ideas but are written using different words. Plain monolingual corpora are not a ready source of paraphrases in the same way that multiple translations and comparable corpora are. Instead they serve to show the distributional similarity of words. One approach for extracting paraphrases from monolingual corpora involves parsing the corpus and drawing relationships between words which share the same syntactic contexts for instance words which can be modified by the same adjectives and which appear as the objects of the same verbs . We argue that previous paraphrasing techniques are limited since their training data are either relatively rare or must have linguistic markup that requires language-specific tools such as syntactic parsers. Since parallel corpora are comparatively common we can generate a large number of paraphrases for a wider variety of phrases than past methods. Moreover our paraphrasing technique can be applied to more languages since it does not require language-specific tools because it uses language-independent techniques from statistical machine translation. Word and phrase alignment techniques from statistical machine translation serve as the basis of our data-driven paraphrasing technique. Figure illustrates how they are used to extract an English paraphrase from a bilingual parallel corpus by pivoting through foreign language phrases. An English phrase that we want to paraphrase such as
đang nạp các trang xem trước