tailieunhanh - Báo cáo khoa học: " Supervised Ranking in Open-Domain Text Summarization"

The paper proposes and empirically motivates an integration of supervised learning with unsupervised learning to deal with human biases in summarization. In particular, we explore the use of probabilistic decision tree within the clustering framework to account for the variation as well as regularity in human created summaries. The corpus of human created extracts is created from a newspaper corpus and used as a test set. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 465-472. Supervised Ranking in Open-Domain Text Summarization Tadashi Nomoto National Institute of Japanese Literature 1-16-10 Yutaka Shinagawa Tokyo 142-8585 Japan nomoto@ Yuji Matsumoto Nara Institute of Science and Technology 8916-5 Takayama Ikoma Nara 630-0101 Japan matsu@ Abstract The paper proposes and empirically motivates an integration of supervised learning with unsupervised learning to deal with human biases in summarization. In particular we explore the use of probabilistic decision tree within the clustering framework to account for the variation as well as regularity in human created summaries. The corpus of human created extracts is created from a newspaper corpus and used as a test set. We build probabilistic decision trees of different flavors and integrate each of them with the clustering framework. Experiments with the corpus demonstrate that the mixture of the two paradigms generally gives a significant boost in performance compared to cases where either of the two is considered alone. 1 Introduction Nomoto and Matsumoto 2001b have recently made an interesting observation that an unsupervised method based on clustering sometimes better approximates human created extracts than a supervised approach. That appears somewhat contradictory given that a supervised approach should be able to exploit human supplied information about which sentence to include in an extract and which not to whereas an unsupervised approach blindly chooses sentences according to some selection scheme. An interesting question is why this should be the case. The reason may have to do with the variation in human judgments on sentence selection for a summary. In a study to be described later we asked students to select 10 of a text which they find most important for making a summary. If they agree perfectly on their judgments then we

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.