tailieunhanh - A survey on web page de duplication using web mining techniques

The presence of duplicate web pages affects the speed of searching, the relevant documents to be retrieved and thereby the search engine performance. Web mining is the application of data mining techniques to discover patterns from the World Wide Web. Web mining can be divided into three different types. | ISSN:2249-5789 Rekha V R et al, International Journal of Computer Science & Communication Networks,Vol 6(2),32-36 A Survey on Web Page De-duplication using Web Mining Techniques Rekha V R1, Resmy V R2 1 Assistant Professor in IT, Department of IT, College of Engineering, Kidangoor, Kottayam, India. 2 Associate Professor, Dept. of Computer Science, Sarabhai Institute of Science and Technology, Vellanad, Thiruvananthapuram, India Abstract : Web page duplication in the Internet adversely affects crawling, ranking and thereby severely affects search engine speed. In this paper, we present a survey based on a number of recent research papers published in the area of Web Search and Mining. The presence of duplicate web pages affects the speed of searching, the relevant documents to be retrieved and thereby the search engine performance. Web mining is the application of data mining techniques to discover patterns from the World Wide Web. Web mining can be divided into three different types. Keywords: Search duplication, Web Information Retrieval Engine, Mining, I. Data deCrawling, INTRODUCTION Internet commerce is one of the fastest growing industries today. With the wide range of capabilities the web has it make it easier and cost efficient for businesses to make transactions with other businesses. One factor that allows businesses to find each other is search engines. Search engines are part of the reason the web is growing so rapidly. Search engines have many capabilities from using key words or phrases to find what you are looking for to using general statements to browse the web. But what exactly is a search engine? Search engines are huge databases of web page files that have been assembled automatically by machine. There are two types of search engines. One type is the individual search engine. This type of search IJCSCN | April-May 2016 Available online@ engine compiles its information on to its own database making it accessible when you .