tailieunhanh - Báo cáo khoa học: "Semi-supervised Dependency Parsing using Lexical Affinities"
Treebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities, on large corpora, for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease of the error rate of Labeled Accuracy Score yielding the best parsing results on this treebank | Semi-supervised Dependency Parsing using Lexical Affinities Seyed Abolghasem Mirroshandel Alexis Nasr Joseph Le Roux 1 .aboratoire d Informatique Fondamentale de Marseille- CNRS - UMR 7279 Universite Aix-Marseille Marseille France 1 .H N Universite Paris Nord CNRS Villetaneuse France Computer Engineering Department Sharif university of Technology Tehran Iran leroux@ Abstract Treebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities on large corpora for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease of the error rate of Labeled Accuracy Score yielding the best parsing results on this treebank. 1 Introduction Probabilistic parsers are usually trained on treebanks composed of few thousands sentences. While this amount of data seems reasonable for learning syntactic phenomena and to some extent very frequent lexical phenomena involving closed parts of speech POS it proves inadequate when modeling lexical dependencies between open POS such as nouns verbs and adjectives. This fact was first recognized by Bikel 2004 who showed that bilexical dependencies were barely used in Michael Collins parser. The work reported in this paper aims at a better modeling of such phenomena by using a raw corpus that is several orders of magnitude larger than the treebank used for training the parser. The raw corpus is first parsed and the computed lexical affinities between lemmas in specific lexico-syntactic configurations are then injected back in the parser. Two outcomes are expected from this procedure the first 777 is as mentioned above a better modeling of bilexi-cal dependencies and the second is a method to adapt a .
đang nạp các trang xem trước