tailieunhanh - Texts and their processingOne of the simplest and natural types of information representation is by means of written texts

The first electronic digital computers were developed between 1940 and 1945 in the United Kingdom and United States. Originally they were the size of a large room, consuming as much power as several hundred modern personal computers (PCs).[1] In this era mechanical analog computers were used for military applications. | fllgorilhms ĩĩlaxime Crochemore UJo ịcỉecỉi Ryĩĩer 3 21 7 97 Chapter 1 1. Introduction . Texts and their processing One of the simplest and natural types of information representation is by means of written texts. Data to be processed often does not decompose into independent records. This type of data is characterized by the fact that it can be written down as a long sequence of characters. Such linear sequence is called a text. The texts are central in word processing systems which provide facilities for the manipulation of texts. Such systems usually process objects which are quite large. For example this book contains probably more than a million characters. Text algorithms occur in many areas of science and information processing. Many text editors and programming languages have facilities for processing texts. In biology text algorithms arise in the study of molecular sequences. The complexity of text algorithms is also one of the central and most studied problems in theoretical computer science. It could be said that it is the domain where the practice and theory are very close together. The basic textual problem is the problem called pattern matching. It is used to access information and probably many computers are solving in this moment this problem as a frequently used operation in some application system. Pattern-matching is comparable in this sense to sorting or to basic arithmetic operations. Consider the problem of a reader of the French dictionary Grand Larousse who wants all entries related to the word Marie-Curie-Sklodowska . This is an example of a pattern matching problem or string-matching. In this case the word Marie-Curie-Sklodowska is the pattern. Generally we may want to find a string called a pattern of length m inside a text of length n where n is greater than m. The pattern can be described in a more complex way to denote a set of strings and not only a single word. In many cases m is very large. In genetics the pattern can correspond