tailieunhanh - Báo cáo khoa học: "Poliqarp An open source corpus indexer and search engine with syntactic extensions"

This paper presents recent extensions to Poliqarp, an open source tool for indexing and searching morphosyntactically annotated corpora, which turn it into a tool for indexing and searching certain kinds of treebanks, complementary to existing treebank search engines. In particular, the paper discusses the motivation for such a new tool, the extended query syntax of Poliqarp and implementation and efficiency issues. | Poliqarp An open source corpus indexer and search engine with syntactic extensions Daniel Janus Sentivision Polska Sp. z . Marynarska 19a 02-674 Warsaw Poland nathell@ Adam Przepiórkowski Insitute of Computer Science Polish Academy of Sciences Ordona 21 01-237 Warsaw Poland adamp@ Abstract This paper presents recent extensions to Poliqarp an open source tool for indexing and searching morphosyntactically annotated corpora which turn it into a tool for indexing and searching certain kinds of treebanks complementary to existing treebank search engines. In particular the paper discusses the motivation for such a new tool the extended query syntax of Poliqarp and implementation and efficiency issues. 1 Introduction The aim of this paper is to present extensions to Poliqarp 1 an effi cient open source indexer and search tool for morphosyntactically annotated XCES-encoded Ide et al. 2000 corpora with query syntax based on that of CQP Christ 1994 but extending it in interesting ways. Poliqarp has been in constant development since 2003 Przepiórkowski et al. 2004 and it is currently employed as the search engine of the IPI PAN Corpus of Polish Przepiórkowski 2004 and the Lisbon corpus of Portuguese Barreto et al. 2006 as well as in other projects. Poliqarp has a typical server-client architecture with various Poliqarp clients developed so far including GUI clients for a variaty of operating systems Linux Windows MacOS Solaris and architectures big-endian and little-endian as well as a PHP client. Since March 2006 the 1st stable version of Poliqarp Janus and 1Polyinterpretation Indexing Query And Retrieval Processor 85 Przepiórkowski 2006 is available under A version of Poliqarp that implements various statistical extensions is at the beta-testing stage. Although Poliqarp was designed as a tool for corpora linguistically annotated at word-level only the extensions described in this paper turn it into an indexing and search tool for certain .