Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Shallow parsing on the basis of words only: A case study"

Thụy Vân 56 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We describe a case study in which a memory-based learning algorithm is trained to simultaneously chunk sentences and assign grammatical function tags to these chunks. We compare the algorithm’s performance on this parsing task with varying training set sizes (yielding learning curves) and different input representations. In particular we compare input consisting of words only, a variant that includes word form information for lowfrequency words, gold-standard POS only, and combinations of these. . | Shallow parsing on the basis of words only A case study Antal van den Bosch and Sabine Buchholz ILK Computational Linguistics and AI Tilburg University Tilburg The Netherlands Antal.vdnBosch S.Buchholz @kub.nl Abstract We describe a case study in which a memory-based learning algorithm is trained to simultaneously chunk sentences and assign grammatical function tags to these chunks. We compare the algorithm s performance on this parsing task with varying training set sizes yielding learning curves and different input representations. In particular we compare input consisting of words only a variant that includes word form information for low-frequency words gold-standard POS only and combinations of these. The wordbased shallow parser displays an apparently log-linear increase in performance and surpasses the flatter POS-based curve at about 50 000 sentences of training data. The low-frequency variant performs even better and the combinations is best. Comparative experiments with a real POS tagger produce lower results. We argue that we might not need an explicit intermediate POS-tagging step for parsing when a sufficient amount of training material is available and word form information is used for low-frequency words. 1 Introduction It is common in parsing to assign part-of-speech POS tags to words as a first analysis step providing information for further steps. In many early parsers the POS sequences formed the only input to the parser i.e. the actual words were not used except in POS tagging. Later with feature-based grammars information on POS had a more central place in the lexical entry of a word than the identity ofthe word itself e.g. MAJORand other HEAD features in Pollard and Sag 1987 . In the early days of statistical parsers POS were explicitly and often exclusively used as symbols to base probabilities on these probabilities are generally more reliable than lexical probabilities due to the inherent sparseness of words. In modern lexicalized parsers .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Creating a manually error-tagged and shallow-parsed learner corpus"

Báo cáo khoa học: "Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling"

Báo cáo khoa học: "Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features"

Báo cáo khoa học: "HPSG Parsing with Shallow Dependency Constraints"

Báo cáo khoa học: "Exploiting Syntactic and Shallow Semantic Kernels for Question/Answer Classiﬁcation"

Báo cáo khoa học: "Shallow Dependency Labeling"

Báo cáo khoa học: "A Study on Convolution Kernels for Shallow Semantic Parsing"

Báo cáo khoa học: "Integrated Shallow and Deep Parsing"

Báo cáo khoa học: "Combining Deep and Shallow Approaches in Parsing German"

Báo cáo khoa học: "Deep Syntactic Processing by Combining Shallow Methods"