tailieunhanh - Báo cáo khoa học: "A Simple Measure to Assess Non-response"

There are several tasks where is preferable not responding than responding incorrectly. This idea is not new, but despite several previous attempts there isn’t a commonly accepted measure to assess non-response. We study here an extension of accuracy measure with this feature and a very easy to understand interpretation. The measure proposed (c@1) has a good balance of discrimination power, stability and sensitivity properties. We show also how this measure is able to reward systems that maintain the same number of correct answers and at the same time decrease the number of incorrect ones, by leaving some questions unanswered | A Simple Measure to Assess Non-response Anselmo Penas and Alvaro Rodrigo UNED NLP IR Group Juan del Rosal 16 28040 Madrid Spain anselmo alvarory@ Abstract There are several tasks where is preferable not responding than responding incorrectly. This idea is not new but despite several previous attempts there isn t a commonly accepted measure to assess non-response. We study here an extension of accuracy measure with this feature and a very easy to understand interpretation. The measure proposed c@1 has a good balance of discrimination power stability and sensitivity properties. We show also how this measure is able to reward systems that maintain the same number of correct answers and at the same time decrease the number of incorrect ones by leaving some questions unanswered. This measure is well suited for tasks such as Reading Comprehension tests where multiple choices per question are given but only one is correct. 1 Introduction There is some tendency to consider that an incorrect result is simply the absence of a correct one. This is particularly true in the evaluation of Information Retrieval systems where in fact the absence of results sometimes is the worse output. However there are scenarios where we should consider the possibility of not responding because this behavior has more value than responding incorrectly. For example during the process of introducing new features in a search engine it is important to preserve users confidence in the system. Thus a system must decide whether it should give or not a result in the new fashion or keep on with the old kind of output. A similar example is the decision 1415 about showing or not ads related to the query. Showing wrong ads harms the business model more than showing nothing. A third example more related to Natural Language Processing is the Machine Reading evaluation through reading comprehension tests. In this case where multiple choices for a question are offered choosing a wrong option should be

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.