Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Creating a manually error-tagged and shallow-parsed learner corpus"

Duy Minh 68 10 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowparsed. | Creating a manually error-tagged and shallow-parsed learner corpus Ryo Nagata Konan University 8-9-1 Okamoto Kobe 658-0072 Japan rnagata @ konan-u.ac.jp. Edward Whittaker Vera Sheinman The Japan Institute for Educational Measurement Inc. 3-2-4 Kita-Aoyama Tokyo 107-0061 Japan whittaker sheinman @jiem.co.jp Abstract The availability of learner corpora especially those which have been manually error-tagged or shallow-parsed is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background we created a novel learner corpus that was manually error-tagged and shallow-parsed. This corpus is available for research and educational purposes on the web. In this paper we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POS-tagging chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring. 1 Introduction The availability of learner corpora is still somewhat limited despite the obvious usefulness of such data in conducting research on natural language processing of learner English in recent years. In particular learner corpora tagged with grammatical errors are rare because of the difficulties inherent in learner corpus creation as will be described in Sect. 2. As shown in Table 1 error-tagged learner corpora are very few among existing learner corpora see Leacock et al. 2010 for a more detailed discussion of learner corpora . Even if data is error-tagged 1210 it is often not available to the public or its access is severely restricted. For example the Cambridge Learner Corpus which is one of the largest error-tagged learner corpora can only be used by .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Demonstration of IlluMe: Creating Ambient According to Instant Message Logs"

Báo cáo khoa học: "Creating Robust Supervised Classiﬁers via Web-Scale N-gram Data"

Báo cáo khoa học: "Creating a manually error-tagged and shallow-parsed learner corpus"

Báo cáo khoa học: "Creating a Gold Standard for Sentence Clustering in Multi-Document Summarization"

Báo cáo khoa học: "Creating a Corpus of Parse-Annotated Questions"

Báo cáo khoa học: "Creating a CCGbank and a wide-coverage CCG lexicon for German"

Báo cáo khoa học: "Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora"

Báo cáo khoa học: "Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text"

báo cáo khoa học: " Creating a knowledge translation trainee collaborative: from conceptualization to lessons learned in the first year"

báo cáo khoa học: "Tobacco industry issues management organizations: Creating a global corporate network to undermine public health"