tailieunhanh - Báo cáo khoa học: "Using Error-Correcting Output Codes with Model-Refinement to Boost Centroid Text Classifier"

In this work, we investigate the use of error-correcting output codes (ECOC) for boosting centroid text classifier. The implementation framework is to decompose one multi-class problem into multiple binary problems and then learn the individual binary classification problems by centroid classifier. However, this kind of decomposition incurs considerable bias for centroid classifier, which results in noticeable degradation of performance for centroid classifier. In order to address this issue, we use Model-Refinement to adjust this so-called bias. . | Using Error-Correcting Output Codes with Model-Refinement to Boost Centroid Text Classifier Songbo Tan Information Security Center ICT . Box 2704 Beijing 100080 China tansongbo@software . tansongbo@ Abstract In this work we investigate the use of error-correcting output codes ECOC for boosting centroid text classifier. The implementation framework is to decompose one multi-class problem into multiple binary problems and then learn the individual binary classification problems by centroid classifier. However this kind of decomposition incurs considerable bias for centroid classifier which results in noticeable degradation of performance for centroid classifier. In order to address this issue we use Model-Refinement to adjust this so-called bias. The basic idea is to take advantage of misclassified examples in the training data to iteratively refine and adjust the centroids of text data. The experimental results reveal that Model-Refinement can dramatically decrease the bias introduced by ECOC and the combined classifier is comparable to or even better than SVM classifier in performance. 1. Introduction In recent years ECOC has been applied to boost the naive bayes decision tree and SVM classifier for text data Berger 1999 Ghani 2000 Ghani 2002 Rennie et al. 2001 . Following this research direction in this work we explore the use of ECOC to enhance the performance of centroid classifier Han et al. 2000 . To the best of our knowledge no previous work has been conducted on exactly this problem. The framework we adopted is to decompose one multi-class problem into multiple binary problems and then use centroid classifier to learn the individual binary classification problems. However this kind of decomposition incurs considerable bias Liu et al. 2002 for centroid classifier. In substance centroid classifier Han et al. 2000 relies on a simple decision rule that a given document should be assigned a particular class if the similarity or distance of .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.