tailieunhanh - Báo cáo khoa học: "A Modular Toolkit for Coreference Resolution"

Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort, yet there is very limited availability of off-the shelf tools for researchers whose interests are not in coreference, or for researchers who want to concentrate on a specific aspect of the problem. We present BART, a highly modular toolkit for developing coreference applications. In the Johns Hopkins workshop on using lexical and encyclopedic knowledge for entity disambiguation, the toolkit was used to extend a reimplementation of the Soon et al. . | BART A Modular Toolkit for Coreference Resolution Yannick Versley Simone Paolo Ponzetto Massimo Poesio Vladimir Eidelman University of Tubingen EML Research gGmbH University of Essex Columbia University versley@ ponzetto@ poesio@ vae2101@ Alan Jern Jason Smith Xiaofeng Yang Alessandro Moschitti UCLA Johns Hopkins University Inst. for Infocomm Research University of Trento ajern@ jsmith@ xiaofengy@ moschitti@ Abstract Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort yet there is very limited availability of off-the shelf tools for researchers whose interests are not in coreference or for researchers who want to concentrate on a specific aspect of the problem. We present BART a highly modular toolkit for developing coreference applications. In the Johns Hopkins workshop on using lexical and encyclopedic knowledge for entity disambiguation the toolkit was used to extend a reimplementation of the Soon et al. 2001 proposal with a variety of additional syntactic and knowledge-based features and experiment with alternative resolution processes preprocessing tools and classifiers. 1 Introduction Coreference resolution refers to the task of identifying noun phrases that refer to the same extralinguis-tic entity in a text. Using coreference information has been shown to be beneficial in a number of other tasks including information extraction McCarthy and Lehnert 1995 question answering Morton 2000 and summarization Steinberger et al. 2007 . Developing a full coreference system however is a considerable engineering effort which is why a large body of research concerned with feature engineering or learning methods . Culotta et al. 2007 Denis and Baldridge 2007 uses a simpler but non-realistic setting using pre-identified mentions and the use of coreference information in summa .