tailieunhanh - Báo cáo khoa học: "Inducing Gazetteers for Named Entity Recognition by Large-scale Clustering of Dependency Relations"

We propose using large-scale clustering of dependency relations between verbs and multiword nouns (MNs) to construct a gazetteer for named entity recognition (NER). Since dependency relations capture the semantics of MNs well, the MN clusters constructed by using dependency relations should serve as a good gazetteer. However, the high level of computational cost has prevented the use of clustering for constructing gazetteers. | Inducing Gazetteers for Named Entity Recognition by Large-scale Clustering of Dependency Relations Jun ichi Kazama Japan Advanced Institute of Science and Technology JAIST Asahidai 1-1 Nomi Ishikawa 923-1292 Japan kazama@ Kentaro Torisawa National Institute of Information and Communications Technology NICT 3-5 Hikaridai Seika-cho Soraku-gun Kyoto 619-0289 Japan torisawa@ Abstract We propose using large-scale clustering of dependency relations between verbs and multiword nouns MNs to construct a gazetteer for named entity recognition NER . Since dependency relations capture the semantics of MNs well the MN clusters constructed by using dependency relations should serve as a good gazetteer. However the high level of computational cost has prevented the use of clustering for constructing gazetteers. We parallelized a clustering algorithm based on expectationmaximization EM and thus enabled the construction of large-scale MN clusters. We demonstrated with the IREX dataset for the Japanese NER that using the constructed clusters as a gazetteer cluster gazetteer is a effective way of improving the accuracy of NER. Moreover we demonstrate that the combination of the cluster gazetteer and a gazetteer extracted from Wikipedia which is also useful for NER can further improve the accuracy in several cases. 1 Introduction Gazetteers or entity dictionaries are important for performing named entity recognition NER accurately. Since building and maintaining high-quality gazetteers by hand is very expensive many methods have been proposed for automatic extraction of gazetteers from texts Riloff and Jones 1999 Thelen and Riloff 2002 Etzioni et al. 2005 Shinzato et al. 2006 Talukdar et al. 2006 Nadeau et al. 2006 . Most studies using gazetteers for NER are based on the assumption that a gazetteer is a mapping from a multi-word noun MN 1 to named entity categories such as Tokyo Stock Exchange ORGANIZATION .2 However since the correspondence between the labels and

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.