tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 96
Data Mining and Knowledge Discovery Handbook, 2 Edition part 96. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 48 A Review of Web Document Clustering Approaches Nora Oikonomakou1 and Michalis Vazirgiannis2 1 Department of Informatics Athens University of Economics and Business AUEB Patision 76 10434 Greece oikonomn@ 2 Department of Informatics Athens University of Economics and Business AUEB Patision 76 10434 Greece mvazirg@ Summary. Nowadays the Internet has become the largest data repository facing the problem of information overload. Though the web search environment is not ideal. The existence of an abundance of information in combination with the dynamic and heterogeneous nature of the Web makes information retrieval a difficult process for the average user. It is a valid requirement then the development of techniques that can help the users effectively organize and browse the available information with the ultimate goal of satisfying their information need. Cluster analysis which deals with the organization of a collection of objects into cohesive groups can play a very important role towards the achievement of this objective. In this chapter we present an exhaustive survey of web document clustering approaches available on the literature classified into three main categories text-based link-based and hybrid. Furthermore we present a thorough comparison of the algorithms based on the various facets of their features and functionality. Finally based on the review of the different approaches we conclude that although clustering has been a topic for the scientific community for three decades there are still many open issues that call for more research. Key words Clustering World Wide Web Web-Mining Text-Mining Introduction Nowadays the internet has become the largest data repository facing the problem of information overload. In the same time more and more people use the World Wide Web as their main source of information. The existence of an abundance of information in combination with the dynamic and heterogeneous nature of the Web makes information .
đang nạp các trang xem trước