tailieunhanh - Báo cáo khoa học: "A Generative Entity-Mention Model for Linking Entities with Knowledge Base"

Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper, we propose a generative probabilistic model, called entitymention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. . | A Generative Entity-Mention Model for Linking Entities with Knowledge Base Xianpei Han Le Sun Institute of Software Chinese Academy of Sciences HaiDian District Beijing China. xianpei sunle @ Abstract Linking entities with knowledge base entity linking is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper we propose a generative probabilistic model called entitymention model which can leverage heterogenous entity knowledge including popularity knowledge name knowledge and context knowledge for the entity linking task. In our model each name mention to be linked is modeled as a sample generated through a three-step generative story and the entity knowledge is encoded in the distribution of entities in document P e the distribution of possible names of a specific entity P s e and the distribution of possible contexts of a specific entity P c e . To find the referent entity of a name mention our method combines the evidences from all the three distributions P e P s e and P c e . Experimental results show that our method can significantly outperform the traditional methods. 1 Introduction In recent years due to the proliferation of knowledge-sharing communities like Wikipedia1 and the many research efforts for the automated knowledge base population from Web like the Read the Web2 project more and more large-scale knowledge bases are available. These knowledge bases contain rich knowledge about the world s entities their semantic properties and the semantic relations between each other. One of the most notorious examples is Wikipedia its 2010 English 1 http 2 http version contains more than 3 million entities and 20 million semantic relations. Bridging these knowledge bases with the textual data can facilitate many .