Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "HOW DO WE COUNT? THE PROBLEM OF TAGGING PHRASAL VERBS IN PARTS"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper examines the current performance of the stochastic tagger P A R T S (Church 88) in handling phrasal verbs, describes a problem that arises from the statistical model used, and suggests a way to improve the tagger's performance. The solution involves a change in the definition of what counts as a word for the purpose of tagging phrasal verbs. | HOW DO WE COUNT THE PROBLEM OF TAGGING PHRASAL VERBS IN PARTS Nava A. Shaked The Graduate School and University Center The City University of New York 33 West 42nd Street. New York NY 10036 nava@nynexst.com ABSTRACT This paper examines the current performance of the stochastic tagger PARTS Church 88 in handling phrasal verbs describes a problem that arises from the statistical model used and suggests a way to improve the tagger s performance. The solution involves a change in the definition of what counts as a word for the purpose of tagging phrasal verbs. 1. INTRODUCTION Statistical taggers are commonly used to preprocess natural language. Operations like parsing information retrieval machine translation and so on are facilitated by having as input a text tagged with a part of speech label for each lexical item. In order to be useful a tagger must be accurate as well as efficient. The claim among researchers advocating the use of statistics for NLP e.g. Marcus ei al. 92 is that taggers are routinely correct about 95 of the time. The 5 error rate is not perceived as a problem mainly because human taggers disagree or make mistakes at approximately the same rate. On the other hand even a 5 error rate can cause a much higher rate of mistakes later in processing if the mistake falls on a key element that is crucial to the correct analysis of the whole sentence. One example is the phrasal verb construction e.g. gun down back off . An error in tagging this two element sequence will cause the analysis of the entire sentence to be faulty. An analysis of the errors made by the stochastic tagger PARTS Church 88 reveals that phrasal verbs do indeed constitute a problem for the model. 2. PHRASAL VERBS The basic assumption underlying the stochastic process is the notion of independence. Words are defined as units separated by spaces and then undergo statistical approximations. As a result the elements of a phrasal verb are treated as two individual words each with its own .