tailieunhanh - Báo cáo khoa học: "An Exact A* Method for Deciphering Letter-Substitution Ciphers"

Letter-substitution ciphers encode a document from a known or hypothesized language into an unknown writing system or an unknown encoding of a known writing system. It is a problem that can occur in a number of practical applications, such as in the problem of determining the encodings of electronic documents in which the language is known, but the encoding standard is not. It has also been used in relation to OCR applications. In this paper, we introduce an exact method for deciphering messages using a generalization of the Viterbi algorithm. . | An Exact A Method for Deciphering Letter-Substitution Ciphers Eric Corlett and Gerald Penn Department of Computer Science University of Toronto ecorlett gpenn @ Abstract Letter-substitution ciphers encode a document from a known or hypothesized language into an unknown writing system or an unknown encoding of a known writing system. It is a problem that can occur in a number of practical applications such as in the problem of determining the encodings of electronic documents in which the language is known but the encoding standard is not. It has also been used in relation to OCR applications. In this paper we introduce an exact method for deciphering messages using a generalization of the Viterbi algorithm. We test this model on a set of ciphers developed from various web sites and find that our algorithm has the potential to be a viable practical method for efficiently solving decipherment problems. 1 Introduction Letter-substitution ciphers encode a document from a known language into an unknown writing system or an unknown encoding of a known writing system. This problem has practical significance in a number of areas such as in reading electronic documents that may use one of many different standards to encode text. While this is not a problem in languages like English and Chinese which have a small set of well known standard encodings such as ASCII Big5 and Unicode there are other languages such as Hindi in which there is no dominant encoding standard for the writing system. In these languages we would like to be able to automatically retrieve and display the information in electronic documents which use unknown encodings when we find them. We also want to use these documents for information retrieval and data mining in which case it is important to be able to read through them automatically without resorting to a human annotator. The holy grail in this area would be an application to archaeological decipherment in which the underlying language s

TỪ KHÓA LIÊN QUAN