tailieunhanh - Báo cáo khoa học: "A High-Speed Large-Capacity Dictionary System"

A system of dictionary organization is described which makes it possible for a computer with 32,000 words of core storage to accommodate a vocabulary of hundreds of thousands of words, with a look-up speed of over a hundred words per second. The central part of the look-up process involves using the first few letters of each word as addresses, one after another. | Mechanical Translation November 1961 A High-Speed Large-Capacity Dictionary System by Sydney M. Lamb and William H. Jacobsen Jr. University of California Berkeley A system of dictionary organization is described which makes it possible for a computer with 32 000 words of core storage to accommodate a vocabulary of hundreds of thousands of words with a look-up speed of over a hundred words per second. The central part of the look-up process involves using the firstfew letters of each word as addresses one after another. Introductory This paper describes a method of adapting dictionaries for use by a computer in such a way that comprehensiveness of vocabulary coverage can be maximized while look-up time is minimized. Although the programming of the system has not yet been completed it is estimated at the time of writing that it will allow for a dictionary of 20 000 entries or more with a total look-up time of about 8 milliseconds .008 seconds per word when used on an IBM 704 computer with 32 000 words of core storage. With a proper system of segmentation a dictionary of 20 000 entries can handle several hundred thousand different words thus providing ample coverage for a single fairly broad field of science. Although the system has been designed specifically for purposes of machine translation of Russian it is applicable to other areas of linguistic data processing in which dictionaries are needed. Preliminary Definitions An entity for which there is or should be a dictionary entry is a lexical item or lex. A text is made up of a sequence of lexes for each of which we hope to find a dictionary entry if we are translating or analyzing it. It is also made up of a sequence of words but if any segmentation of words is incorporated in the system many of the words will consist of more than one lex. In the system used at the University of California there are about two lexes per word on the average. A word on the graphemic level is a sequence of graphemes which can .

TỪ KHÓA LIÊN QUAN