tailieunhanh - Báo cáo khoa học: "Clique-Based Clustering for improving Named Entity Recognition systems"
We propose a system which builds, in a semi-supervised manner, a resource that aims at helping a NER system to annotate corpus-specific named entities. This system is based on a distributional approach which uses syntactic dependencies for measuring similarities between named entities. The specificity of the presented method however, is to combine a clique-based approach and a clustering technique that amounts to a soft clustering method. | Clique-Based Clustering for improving Named Entity Recognition systems Julien Ah-Pine Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan France Guillaume Jacquet Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan France Abstract We propose a system which builds in a semi-supervised manner a resource that aims at helping a NER system to annotate corpus-specific named entities. This system is based on a distributional approach which uses syntactic dependencies for measuring similarities between named entities. The specificity of the presented method however is to combine a clique-based approach and a clustering technique that amounts to a soft clustering method. Our experiments show that the resource constructed by using this cliquebased clustering system allows to improve different NER systems. 1 Introduction In Information Extraction domain named entities NEs are one of the most important textual units as they express an important part of the meaning of a document. Named entity recognition NER is not a new domain see MUC1 and ACE2 conferences but some new needs appeared concerning NEs processing. For instance the NE Oxford illustrates the different ambiguity types that are interesting to address intra-annotation ambiguity Wikipedia lists more than 25 cities named Oxford in the world systematic inter-annotation ambiguity the name of cities could be used to refer to the university of this city or the football club of this city. This is the case for Oxford or Newcastle non-systematic inter-annotation ambiguity Oxford is also a company unlike Newcastle. The main goal of our system is to act in a complementary way with an existing NER system in order to enhance its results. We address two kinds 1 http related_projects muc 2http speech tests ace of issues first we want to detect and correctly annotate corpus-specific NEs3 that the NER system could .
đang nạp các trang xem trước