tailieunhanh - Báo cáo khoa học: "Clustering Technique in Multi-Document Personal Name Disambiguation"

Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of pointwise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that, we apply a labeling method to find representative feature for each cluster. . | Clustering Technique in Multi-Document Personal Name Disambiguation Chen Chen Key Laboratory of Computational Linguistics Peking University Ministry of Education China chenchen@ Hu Junfeng Key Laboratory of Computational Linguistics Peking University Ministry of Education China hujf@ Wang Houfeng Key Laboratory of Computational Linguistics Peking University Ministry of Education China wanghf@ Abstract Focusing on multi-document personal name disambiguation this paper develops an agglo-merative clustering approach to resolving this problem. We start from an analysis of pointwise mutual information between feature and the ambiguous name which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that we apply a labeling method to find representative feature for each cluster. Finally experiments are conducted on word-based clustering in Chinese dataset and the result shows a good effect. 1 Introduction Multi-document named entity co-reference resolution is the process of determining whether an identical name occurring in different texts refers to the same entity in the real world. With the rapid development of multi-document applications like multi-document summarization and information fusion there is an increasing need for multidocument named entity co-reference resolution. This paper focuses on multi-document personal name disambiguation which seeks to determine if the same name from different documents refers to the same person. This paper develops an agglomerative clustering approach to resolving multi-document personal name disambiguation. In order to represent texts better a novel weight computing method for clustering features is presented. It is based on the pointwise mutual information between the ambiguous name and features. This paper also develops a trade-off point based .

TỪ KHÓA LIÊN QUAN