tailieunhanh - Báo cáo khoa học: "BULK PROCESSING OF TEXT ON A MASSIVELY PARALLEL COMPUTER"

Dictionary lookup is a computational activity that can be greatly accelerated when performed on large amounts of text by a parallel computer such as the Connection Machine T M Computer (CM). Several algorithms for parallel dictionary lookup are discussed, including one that allows the CM to lookup words at a rate 450 times that of lookup on a Symbolics 3600 Lisp Machine. | BULK PROCESSING OF TEXT ON A MASSIVELY PARALLEL COMPUTER Gary w. Sabot Thinking Machines Corporation 245 First St. Cambridge MA 02142 Abstract Dictionary lookup is a computational activity that can be greatly accelerated when performed on large amounts of text by a parallel computer such as the Connection Machine Computer CM . Several algorithms for parallel dictionary lookup are discussed including one that allows the CM to lookup words at a rate 450 times that of lookup on a Symbolics 3600 Lisp Machine. 1 An Overview of the Dictionary Problem This paper will discuss one of the text processing problems that was encountered during the implementation of the CM-Indexer a natural language processing program that runs on the Connection Machine CM . The problem is that of parallel dictionary lookup given both a dictionary and a text consisting of many thousands of words how can the appropriate definitions be distributed to the words in the text as rapidly as possible A parallel dictionary lookup algorithm that makes efficient use of the CM hardware was discovered and is described in this paper. It is clear that there are many natural language processing applications in which such a dictionary algorithm is necessary. Indexing and searching of databases consisting of unformatted natural language text is one such application. The proliferation of personal computers the widespread use of electronic memos and electronic mail in large corporations and the CD-ROM are all contributing to an explosion in the amount of useful unformatted text in computer readable form. Parallel computers and algorithms provide one way of dealing with this explosion. 2 The CM Machine Description The CM consists of a large number number of proces-sor memory cells. These cells are used to store data structures. In accordance with a stream of instructions that are broadcast from a single conventional host computer the many processors can manipulate the data in the nodes of the data structure in .

TỪ KHÓA LIÊN QUAN