tailieunhanh - Báo cáo khoa học: "Unsupervised Discrimination and Labeling of Ambiguous Names"

This paper describes adaptations of unsupervised word sense discrimination techniques to the problem of name discrimination. These methods cluster the contexts containing an ambiguous name, such that each cluster refers to a unique underlying person or place. We also present new techniques to assign meaningful labels to the discovered clusters. | Unsupervised Discrimination and Labeling of Ambiguous Names Anagha K. Kulkarni Department of Computer Science University Of Minnesota Duluth MN 55812 kulka020@ http Abstract This paper describes adaptations of unsupervised word sense discrimination techniques to the problem of name discrimination. These methods cluster the contexts containing an ambiguous name such that each cluster refers to a unique underlying person or place. We also present new techniques to assign meaningful labels to the discovered clusters. 1 Introduction A name assigned to an entity is often thought to be a unique identifier. However this is not always true. We frequently come across multiple people sharing the same name or cities and towns that have identical names. For example the top ten results for a Google search of John Gilbert return six different individuals A famous actor from the silent film era a British painter a professor of Computer Science etc. Name ambiguity is relatively common and makes searching for people places or organizations potentially very confusing. However in many cases a human can distinguish between the underlying entities associated with an ambiguous name with the help of surrounding context. For example a human can easily recognize that a document that mentions Silent Era Silver Screen and The Big Parade refers to John Gilbert the actor and not the professor. Thus the neighborhood of the ambiguous name reveals distinguishing features about the underlying entity. Our approach is based on unsupervised learning from raw text adapting methods originally proposed by Purandare and Pedersen 2004 . We do not utilize any manually created examples knowledge bases dictionaries or ontologies in formulating our solution. Our goal is to discriminate among multiple contexts that mention a particular name strictly on the basis of the surrounding contents and assign meaningful labels to the resulting clusters that identify the underlying