Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Optimizing Question Answering Accuracy by Maximizing Log-Likelihood"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. | Optimizing Question Answering Accuracy by Maximizing Log-Likelihood Matthias H. Heie Edward W. D. Whittaker and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552 Japan heie edw furui @furui.cs.titech.ac.jp Abstract In this paper we demonstrate that there is a strong correlation between the Question Answering QA accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters. 1 Introduction Question Answering QA distinguishes itself from other information retrieval tasks in that the system tries to return accurate answers to queries posed in natural language. Factoid QA limits itself to questions that can usually be answered with a few words. Typically factoid QA systems employ some form of question type analysis so that a question such as What is the capital of Japan will be answered with a geographical term. While many QA systems use hand-crafted rules for this task such an approach is time-consuming and doesn t generalize well to other languages. Machine learning methods have been proposed such as question classification using support vector machines Zhang and Lee 2003 and language modeling Merkel and Klakow 2007 . In these approaches question categories are predefined and a classifier is trained on manually labeled data. This is an example of supervised learning. In this paper we present an unsupervised method where we attempt to cluster question-and-answer q-a pairs without any predefined question categories hence no manually class-labeled questions are used. We use a statistical QA framework described in Section 2 where the system is trained with clusters of q-a pairs. This framework was used in several TREC .