tailieunhanh - Báo cáo khoa học: "Named Entity Recognition without Gazetteers"

It is often claimed that Named Entity recognition systems need extensive gazetteers--lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. . | Proceedings of EACL 99 Named Entity Recognition without Gazetteers Andrei Mikheev Marc Moens and Claire Grover HCRC Language Technology Group University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW UK. mikheev@ Abstract It is often claimed that Named Entity recognition systems need extensive gazetteers lists of names of people organisations locations and other named entities. Indeed the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical maximum entropy models. We report on the system s performance with gazetteers of different types and different sizes using test material from the MUC-7 competition. We show that for the text type and task of this competition it is sufficient to use relatively small gazetteers of well-known names rather than large gazetteers of low-frequency names. We conclude with observations about the domain independence of the competition and of our experiments. 1 Introduction Named Entity recognition involves processing a text and identifying certain occurrences of words or expressions as belonging to particular categories of Named Entities ne ne recognition software serves as an important preprocessing tool for tasks such as information extraction information retrieval and other text processing applications. What counts as a Named Entity depends on the application that makes use of the annotations. One such application is document retrieval or automated document forwarding documents an-noted with NE information can be searched more Now also at Harlequin Ltd. Edinburgh office accurately than raw text. For example NE annotation allows you to search for all texts that mention the company Philip Morris ignoring documents about a possibly unrelated person by the same name. Or you can have all documents forwarded to you about a .

TỪ KHÓA LIÊN QUAN