tailieunhanh - manning schuetze statisticalnlp phần 9
, nhưng thời gian này, chúng tôi tìm thấy một trong những có thể xảy ra nhất thay vì tổng hợp trên tất cả các quy định như vậy, và ghi lại rằng một trong những có thể có (biến, giá trị là một danh sách của ba số nguyên ghi lại hình thức của việc áp dụng quy tắc trong đó đã có xác suất cao nhất 1. khởi | Some Background on Information Retrieval 537 Figure Two examples of precision-recall curves. The two curves are for ranking 3 in table uninterpolated above and interpolated below . 538 15 Topics in ỉn formation Retrieval Any of the measures discussed above can be used to compare the performance of information retrieval systems. One common approach is to run the systems on a corpus and a set of queries and average the performance measure over queries. If the average of system 1 is better than the average of system 2 then that is evidence that system 1 is better than system 2. Unfortunately there are several problems with this experimental design. The difference in averages could be due to chance. Or it could be due to one query on which system 1 outperforms system 2 by a large margin with performance on all other queries being about the same. It is therefore advisable to use a statistical test like the test for system comparison as shown in section . The probability ranking principle PRP Ranking documents is intuitively plausible since it gives the user some control over the tradeoff between precision and recall. If recall for the first page of results is low and the desired information is not found then the user can look at the next page which in most cases trades higher recall for lower precision. The following principle is a guideline which is one way to make the assumptions explicit that underlie the design of retrieval by ranking. We present it in a form simplified from van Rijsbergen 1979 113 Probability Ranking Principle PRP . Ranking documents in order of decreasing probability of relevance is optimal. The basic idea is that we view retrieval as a greedy search that aims to identify the most valuable document at any given time. The document that is most likely to be valuable is the one with the highest estimated probability of relevance where we consider all documents that haven been retrieved yet that is with a maximum value for
đang nạp các trang xem trước