Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construct of communicative competence. Our goal is to find effective features, selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses, and to build automatic scoring models based on the most promising. | Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech Miao Chen School of Information Studies Syracuse University Syracuse NY USA mchen14@syr.edu Klaus Zechner NLP Speech Group Educational Testing Service Princeton NJ USA kzechner@ets.org Abstract This paper focuses on identifying extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construct of communicative competence. Our goal is to find effective features selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses and to build automatic scoring models based on the most promising features by using machine learning methods. On human transcriptions with manually annotated clause and sentence boundaries our best scoring model achieves an overall Pearson correlation with human rater scores of r 0.49 on an unseen test set whereas correlations of models using sentence or clause boundaries from automated classifiers are around r 0.2. 1 Introduction Past efforts directed at automated scoring of speech have used mainly features related to fluen cy e.g. speaking rate length and distribution of pauses pronunciation e.g. using log-likelihood scores from the acoustic model of an Automatic Speech Recognition ASR system or prosody e.g. information related to pitch contours or syllable stress e.g. Bernstein 1999 Bernstein et al. 2000 Bernstein et al. 2010 Cucchiarini et al. 722 1997 Cucchiarini et al. 2000 Franco et al. 2000a Franco et al. 2000b Zechner et al. 2007 Zechner et al. 2009 . While this approach is a good match to most of the important properties related to low entropy speech i.e. speech which is highly .