tailieunhanh - Báo cáo khoa học: "Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning"
Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, location, and date. NE recognition plays an essential role in information extraction systems and question answering systems. It is well known that hand-crafted systems with a large set of heuristic rules are difficult to maintain, and corpus-based statistical approaches are expected to be more robust and require less human intervention. . | Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning Hideki Isozaki NTT Communication Science Laboratories 2-4 Hikaridai Seika-cho Souraku-gun Kyoto 619-0237 Japan isozaki@ Abstract Named entity NE recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person organization location and date. NE recognition plays an essential role in information extraction systems and question answering systems. It is well known that hand-crafted systems with a large set of heuristic rules are difficult to maintain and corpus-based statistical approaches are expected to be more robust and require less human intervention. Several statistical approaches have been reported in the literature. In a recent Japanese NE workshop a maximum entropy ME system outperformed decision tree systems and most hand-crafted systems. Here we propose an alternative method based on a simple rule generator and decision tree learning. Our experiments show that its performance is comparable to the ME approach. We also found that it can be trained more efficiently with a large set of training data and that it improves readability. 1 Introduction Named entity NE recognition is a task in which proper nouns and numerical information in a document are detected and classi fied into categories such as person organization location and date. NE recognition plays an essential role in information extraction systems see MUC documents 1996 and question answering systems see TREC-QA documents http . When you want to know the location of the Taj Mahal traditional IR techniques direct you to relevant documents but do not directly answer your question. NE recognition is essential for finding possible answers from documents. Although it is easy to build an NE recognition system with mediocre performance it is difficult to make it reliable because of the large number of .
đang nạp các trang xem trước