tailieunhanh - Báo cáo khoa học: "Organizing Encyclopedic Knowledge based on the Web and its Application to Question Answering"

We propose a method to generate large-scale encyclopedic knowledge, which is valuable for much NLP research, based on the Web. We first search the Web for pages containing a term in question. Then we use linguistic patterns and HTML structures to extract text fragments describing the term. Finally, we organize extracted term descriptions based on word senses and domains. In addition, we apply an automatically generated encyclopedia to a question answering system targeting the Japanese InformationTechnology Engineers Examination. . | Organizing Encyclopedic Knowledge based on the Web and its Application to Question Answering Atsushi Fujii University of Library and Information Science 1-2 Kasuga Tsukuba 305-8550 Japan CREST Japan Science and Technology Corporation fujii@ Tetsuya Ishikawa University of Library and Information Science 1-2 Kasuga Tsukuba 305-8550 Japan ishikawa@ Abstract We propose a method to generate large-scale encyclopedic knowledge which is valuable for much NLP research based on the Web. We first search the Web for pages containing a term in question. Then we use linguistic patterns and HTML structures to extract text fragments describing the term. Finally we organize extracted term descriptions based on word senses and domains. In addition we apply an automatically generated encyclopedia to a question answering system targeting the Japanese InformationTechnology Engineers Examination. 1 Introduction Reflecting the growth in utilization of the World Wide Web a number of Web-based language processing methods have been proposed within the natural language processing NLP information retrieval IR and artificial intelligence AI communities. A sample of these includes methods to extract linguistic resources Fujii and Ishikawa 2000 Resnik 1999 Soderland 1997 retrieve useful information in response to user queries Etzioni 1997 McCallum et al. 1999 and mine discover knowledge latent in the Web Inokuchi et al. 1999 . In this paper mainly from an NLP point of view we explore a method to produce linguistic resources. Specifically we enhance the method proposed by Fu-jii and Ishikawa 2000 which extracts encyclopedic knowledge . term descriptions from the Web. In brief their method searches the Web for pages containing a term in question and uses linguistic expressions and HTML layouts to extract fragments describing the term. They also use a language model to discard non-linguistic fragments. In addition a clustering method is used to divide descriptions into a .

TÀI LIỆU LIÊN QUAN