tailieunhanh - Báo cáo khoa học: "Semi-supervised Training for the Averaged Perceptron POS Tagger"

This paper describes POS tagging experiments with semi-supervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standard-sized supervised (manually annotated) dataset (106 tokens) combined with a relatively modest (in the order of 108 tokens) unsupervised (plain) data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than state-of-the-art results for English and Czech ( % and % relative error reduction, respectively; absolute accuracies being % and %). . | Semi-supervised Training for the Averaged Perceptron POS Tagger Drahomira johanka Spoustova Jan Hajic Jan Raab Miroslav Spousta Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University Prague Czech Republic johanka hajic raab spousta @ Abstract This paper describes POS tagging experiments with semi-supervised training as an extension to the supervised averaged perceptron algorithm first introduced for this task by Collins 2002 . Experiments with an iterative training on standard-sized supervised manually annotated dataset 106 tokens combined with a relatively modest in the order of 108 tokens unsupervised plain data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages yielding better than state-of-the-art results for English and Czech and relative error reduction respectively absolute accuracies being and . 1 Introduction Since 2002 we have seen a renewed interest in improving POS tagging results for English and an inflow of results initial or improved for many other languages. For English after a relatively big jump achieved by Collins 2002 we have seen two significant improvements Toutanova et al. 2003 and Shen et al. 2007 pushed the results by a significant amount each 1 In our final comparison we have also included the results of Gimenez and Marquez 2004 because it has surpassed Collins 2002 as well and we have used this tagger in the data preparation phase. See more details below. Most recently Suzuki and Isozaki 2008 published their Semi-supervised sequential labelling method whose results on POS tagging seem to be optically better than Shen et al. 2007 but no significance tests were given and the tool is not available for download . for repeating the results and significance testing. Thus we compare our results only to the tools listed above. Even though an improvement in POS tagging might be a .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
24    117    0    01-06-2024
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.