tailieunhanh - Báo cáo khoa học: "Unsupervised Analysis for Decipherment Problems"

We study a number of natural language decipherment problems using unsupervised learning. These include letter substitution ciphers, character code conversion, phonetic decipherment, and word-based ciphers with relevance to machine translation. Straightforward unsupervised learning techniques most often fail on the first try, so we describe techniques for understanding errors and significantly increasing performance. | Unsupervised Analysis for Decipherment Problems Kevin Knight Anish Nair Nishit Rathod Information Sciences Institute and Computer Science Department University of Southern California knight@ anair nrathod @ Kenji Yamada Language Weaver Inc. 4640 Admiralty Way Suite 1210 Marina del Rey Ca 90292 kyamada@ Abstract We study a number of natural language decipherment problems using unsupervised learning. These include letter substitution ciphers character code conversion phonetic decipherment and word-based ciphers with relevance to machine translation. Straightforward unsupervised learning techniques most often fail on the first try so we describe techniques for understanding errors and significantly increasing performance. 1 Introduction Unsupervised learning holds great promise for breakthroughs in natural language processing. In cases like Yarowsky 1995 unsupervised methods offer accuracy results than rival supervised methods Yarowsky 1994 while requiring only a fraction of the data preparation effort. Such methods have also been a key driver of progress in statistical machine translation which depends heavily on unsupervised word alignments Brown et al. 1993 . There are also interesting problems for which supervised learning is not an option. These include deciphering unknown writing systems such as the Easter Island rongorongo script and the 20 000-word Voynich manuscript. Deciphering animal language is another case. Machine translation of human languages is another when we consider language pairs where little or no parallel text is available. Ultimately unsupervised learning also holds promise for scientific discovery in linguistics. At some point our programs will begin finding novel publishable regularities in vast amounts of linguistic data. 2 Decipherment In this paper we look at a particular type of unsupervised analysis problem in which we face a ciphertext stream and try to uncover the plaintext that lies behind it. We will .

TỪ KHÓA LIÊN QUAN