Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Arabic Cross-Document Coreference Detection"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We describe a set of techniques for Arabic cross-document coreference resolution. We compare a baseline system of exact mention string-matching to ones that include local mention context information as well as information from an existing machine translation system. It turns out that the machine translation-based technique outperforms the baseline, but local entity context similarity does not. This helps to point the way for future crossdocument coreference work in languages with few existing resources for the task. cross-document coreference in Arabic as there is in English (e.g. WebPeople (Artiles, 2008)). Thus, we employed techniques for high-performance processing in a resource-poor. | Arabic Cross-Document Coreference Detection Asad Sayeed 1 2 Tamer Elsayed 1 2 Nikesh Garera 1 6 David Alexander 1 3 Tan Xu 1 4 Douglas W. Oard 1 4 5 David Yarowsky 1 6 Christine Piatko1 1 Human Language Technology Center of Excellence Johns Hopkins University Baltimore MD USA 2Dept. of Computer Science University of Maryland College Park MD USA 3BBN Technologies Cambridge MA USA 4College of Information Studies University of Maryland College Park MD USA 5UMIACS University of Maryland College Park MD USA 6Dept. of Computer Science Johns Hopkins University Baltimore MD USA asayeed telsayed @cs.umd.edu ngarera@cs.jhu.edu dalexand@bbn.com tanx oard @umd.edu yarowsky@cs.jhu.edu Christine.Piatko@jhuapl.edu Abstract We describe a set of techniques for Arabic cross-document coreference resolution. We compare a baseline system of exact mention string-matching to ones that include local mention context information as well as information from an existing machine translation system. It turns out that the machine translation-based technique outperforms the baseline but local entity context similarity does not. This helps to point the way for future crossdocument coreference work in languages with few existing resources for the task. 1 Introduction Our world contains at least two noteworthy George Bushes President George H. W. Bush and President George W. Bush. They are both frequently referred to as George Bush. If we wish to use a search engine to find documents about one of them we are likely also to find documents about the other. Improving our ability to find all documents referring to one and none referring to the other in a targeted search is a goal of crossdocument entity coreference detection. Here we describe some results from a system we built to perform this task on Arabic documents. We base our work partly on previous work done by Bagga and Baldwin Bagga and Baldwin 1998 which has also been used in later work Chen and Martin 2007 . Other work such as Lloyd et al. .