Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Discrete vs. Continuous Rating Scales for Language Evaluation in NLP"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Studies assessing rating scales are very common in psychology and related fields, but are rare in NLP. In this paper we assess discrete and continuous scales used for measuring quality assessments of computergenerated language. We conducted six separate experiments designed to investigate the validity, reliability, stability, interchangeability and sensitivity of discrete vs. continuous scales. We show that continuous scales are viable for use in language evaluation, and offer distinct advantages over discrete scales. . | Discrete vs. Continuous Rating Scales for Language Evaluation in NLP Anja Belz Eric Kow School of Computing Engineering and Mathematics University of Brighton Brighton Bn2 4gJ UK A.S.Belz E.Y.Kow @brighton.ac.uk Abstract Studies assessing rating scales are very common in psychology and related fields but are rare in NLP. In this paper we assess discrete and continuous scales used for measuring quality assessments of computergenerated language. We conducted six separate experiments designed to investigate the validity reliability stability interchangeability and sensitivity of discrete vs. continuous scales. We show that continuous scales are viable for use in language evaluation and offer distinct advantages over discrete scales. 1 Background and Introduction Rating scales have been used for measuring human perception of various stimuli for a long time at least since the early 20th century Freyd 1923 . First used in psychology and psychophysics they are now also common in a variety of other disciplines including NLP. Discrete scales are the only type of scale commonly used for qualitative assessments of computer-generated language in NLP e.g. in the DUC TAC evaluation competitions . Continuous scales are commonly used in psychology and related fields but are virtually unknown in NLP. While studies assessing the quality of individual scales and comparing different types of rating scales are common in psychology and related fields such studies hardly exist in NLP and so at present little is known about whether discrete scales are a suitable rating tool for NLP evaluation tasks or whether continuous scales might provide a better alternative. A range of studies from sociology psychophysiology biometrics and other fields have compared 230 discrete and continuous scales. Results tend to differ for different types of data. E.g. results from pain measurement show a continuous scale to outperform a discrete scale ten Klooster et al. 2006 . Other results Svensson 2000 from .