tailieunhanh - Báo cáo khoa học: "Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations"

In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and instance reliability enabling the filtering algorithm. We present an empirical comparison of Espresso with various state of the art systems, on different size and genre corpora, on extracting various general and specific relations. Experimental results show that our exploitation of generic patterns substantially increases system recall with small effect on overall precision. . | Espresso Leveraging Generic Patterns for Automatically Harvesting Semantic Relations Patrick Pantel Information Sciences Institute University of Southern California 4676 Admiralty Way Marina del Rey CA 90292 pantel@ Abstract In this paper we present Espresso a weakly-supervised general-purpose and accurate algorithm for harvesting semantic relations. The main contributions are i a method for exploiting generic patterns by filtering incorrect instances using the Web and ii a principled measure of pattern and instance reliability enabling the filtering algorithm. We present an empirical comparison of Espresso with various state of the art systems on different size and genre corpora on extracting various general and specific relations. Experimental results show that our exploitation of generic patterns substantially increases system recall with small effect on overall precision. 1 Introduction Recent attention to knowledge-rich problems such as question answering Pasca and Harabagiu 2001 and textual entailment Geffet and Dagan 2005 has encouraged natural language processing researchers to develop algorithms for automatically harvesting shallow semantic resources. With seemingly endless amounts of textual data at our disposal we have a tremendous opportunity to automatically grow semantic term banks and ontological resources. To date researchers have harvested with varying success several resources including concept lists Lin and Pantel 2002 topic signatures Lin and Hovy 2000 facts Etzioni et al. 2005 and word similarity lists Hindle 1990 . Many recent efforts have also focused on extracting semantic relations between entities such as Marco Pennacchiotti ART Group - DISP University of Rome Tor Vergata Viale del Politecnico 1 Rome Italy pennacchiotti@ entailments Szpektor et al. 2004 is-a Ravi-chandran and Hovy 2002 part-of Girju et al. 2006 and other relations. The following desiderata outline the properties of an ideal relation harvesting .