tailieunhanh - Báo cáo khoa học: "PDT 2.0 Requirements on a Query Language"

Linguistically annotated treebanks play an essential part in the modern computational linguistics. The more complex the treebanks become, the more sophisticated tools are required for using them, namely for searching in the data. We study linguistic phenomena annotated in the Prague Dependency Treebank and create a list of requirements these phenomena set on a search tool, especially on its query language. | PDT Requirements on a Query Language Jiri Mírovský Institute of Formal and Applied Linguistics Charles University in Prague Malostranské nám. 25 118 00 Prague 1 Czech Republic mirovsky@ Abstract Linguistically annotated treebanks play an essential part in the modern computational linguistics. The more complex the treebanks become the more sophisticated tools are required for using them namely for searching in the data. We study linguistic phenomena annotated in the Prague Dependency Treebank and create a list of requirements these phenomena set on a search tool especially on its query language. 1 Introduction Searching in a linguistically annotated treebank is a principal task in the modern computational linguistics. A search tool helps extract useful information from the treebank in order to study the language the annotation system or even to search for errors in the annotation. The more complex the treebank is the more sophisticated the search tool and its query language needs to be. The Prague Dependency Treebank Hajic et al. 2006 is one of the most advanced manually annotated treebanks. We study mainly the tectogrammatical layer of the Prague Dependency Treebank PDT which is by far the most advanced and complex layer in the treebank and show what requirements on a query language the annotated linguistic phenomena bring. We also add requirements set by lower layers of annotation. In section 1 after this introduction we mention related works on search languages for various types of corpora. Afterwards we very shortly introduce PDT just to give a general picture of the principles and complexion of the annotation scheme. In section 2 we study the annotation manual for the tectogrammatical layer of PDT t-manual Mikulová et al. 2006 and collect linguistic phenomena that bring special requirements on the query language. We also study lower layers of annotation and add their requirements. In section 3 we summarize the .

TÀI LIỆU LIÊN QUAN