tailieunhanh - A Concise Introduction to Data Compression- P7

Tham khảo tài liệu 'a concise introduction to data compression- p7', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | Chapter Summary 263 The first enhancement improves compression in small alphabets. In Unicode most small alphabets start on a 128-byte boundary although the alphabet size may be more than 128 symbols. This suggests that a difference be computed not between the current and previous code values but between the current code value and the value in the middle of the 128-byte segment where the previous code value is located. Specifically the difference is computed by subtracting a base value from the current code point. The base value is obtained from the previous code point as follows. If the previous code value is in the interval xxxx00 to xxxx7F . its seven LSBs are 0 to 127 the base value is set to xxxx40 the seven LSBs are 64 and if the previous code point is in the range xxxx80 to xxxxFF . its seven least-significant bits are 128 to 255 the base value is set to xxxxC0 the seven LSBs are 192 . This way if the current code point is within 128 positions of the base value the difference is in the range 128 127 which makes it fit in one byte. The second enhancement has to do with remote symbols. A document in a nonLatin alphabet where the code points are very different from the ASCII codes may use spaces between words. The code point for a space is the ASCII code 2016 so any pair of code points that includes a space results in a large difference. BOCU therefore computes a difference by first computing the base values of the three previous code points and then subtracting the smallest base value from the current code point. BOCU-1 is the version of BOCU that s commonly used in practice BOCU-1 02 . It differs from the original BOCU method by using a different set of byte value ranges and by encoding the ASCII control characters U 0000 through U 0020 with byte values 0 through 2016 respectively. These features make BOCU-1 suitable for compressing input files that are MIME text media types. Il faut avoir beaucoup étudié pour savoir peu it is necessary to study much in

TỪ KHÓA LIÊN QUAN