tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 97

Data Mining and Knowledge Discovery Handbook, 2 Edition part 97. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 940 Nora Oikonomakou and Michalis Vazirgiannis take into account information extracted by the link structure of the collection. The underlying idea is that when two documents are connected via a link there exists a semantic relationship between them which can be the basis for the partitioning of the collection into clusters. The use of the link structure for clustering a collection is based on citation analysis from the field of bibliometrics White and McCain 1989 . Citation analysis assumes that if a person creating a document cites two other documents then these documents must be somehow related in the mind of that person. In this way the clustering algorithm tries to incorporate the human judgement when characterizing the documents. Two measures of similarity between two documents p and q based on citation analysis that are widely used are co-citation which is the number of documents that co-cite p and q and bibliographic coupling which is the number of documents that are cited by both p and q. The greater the value of these measures the stronger the relationship between the documents p and q is. Also the length of the path that connects two documents is sometimes considered when calculating the document similarity. There are many uses of the link structure of a web page collection in web IR. Crofts Inference Network Model Croft 1993 uses the links that connect two web pages to enhance the word representation of a web page by the words contained in the pages linked to it. Frei Stieger 1995 characterise a hyperlink by the common words contained in the documents that it connects. This method is proposed for the ranking of the results returned to a user s query. Page et al. 1998 also proposed an algorithm for the ranking of the search results. Their approach PageRank assigns at each web page a score which denotes the importance of that page and depends on the number and importance of pages that point to it. Finally Kleinberg proposed the HITS algorithm Kleinberg .

TỪ KHÓA LIÊN QUAN