tailieunhanh - Báo cáo khoa học: "A Declarative Information Extraction System"

Emerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction (IE) systems. This paper presents SystemT, a declarative IE system that addresses these challenges and has been deployed in a wide range of enterprise applications. SystemT facilitates the development of high quality complex annotators by providing a highly expressive language and an advanced development environment. | SystemT A Declarative Information Extraction System Yunyao Li IBM Research - Almaden 650 Harry Road San Jose CA 95120 yunyaoli@ Frederick R. Reiss IBM Research - Almaden 650 Harry Road San Jose CA 95120 frreiss@ Laura Chiticariu IBM Research - Almaden 650 Harry Road San Jose CA 95120 chiti@ Abstract Emerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction IE systems. This paper presents SystemT a declarative IE system that addresses these challenges and has been deployed in a wide range of enterprise applications. SystemT facilitates the development of high quality complex annotators by providing a highly expressive language and an advanced development environment. It also includes a cost-based optimizer and a high-performance flexible runtime with minimum memory footprint. We present SystemT as a useful resource that is freely available and as an opportunity to promote research in building scalable and usable IE systems. 1 Introduction Information extraction IE refers to the extraction of structured information from text documents. In recent years text analytics have become the driving force for many emerging enterprise applications such as compliance and data redaction. In addition the inclusion of text has also been increasingly important for many traditional enterprise applications such as business intelligence. Not surprisingly the use of information extraction has dramatically increased within the enterprise over the years. While the traditional requirement of extraction quality remains critical enterprise applications pose several two challenges to IE systems 1. Scalability Enterprise applications operate over large volumes of data often orders of 109 magnitude larger than classical IE corpora. An IE system should be able to operate at those scales without compromising its execution efficiency or memory consumption. 2.

TỪ KHÓA LIÊN QUAN