tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 84
Data Mining and Knowledge Discovery Handbook, 2 Edition part 84. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 810 Moty Ben-Dov and Ronen Feldman Extracting relevant information from a document - extract the features entities from a document by using NL IR and association metrics algorithms Feldman et al. 1998 or pattern matching Averbuch et al. 2004 . Finding trend or relations between people places organizations etc. by aggregating and comparing information extracted from the documents. Classifying and organizing documents according to their content Tkach 1998 Retrieving documents based on the various sorts of information about the document content. Clustering documents according to their content Wai-chiu and Fu 2000 . A Text Mining system is composed of 3 major components See Figure Information Feeders enable the connection between various textual collections and the tagging modules. This component connects to any web site streamed source such a news feed internal document collections and any other types of textual collections. Intelligent Tagging A component responsible for reading the text and distilling tagging the relevant information. This component can perform any type of tagging on the documents such as statistical tagging categorization and term extraction semantic tagging information extraction and structural tagging extraction from the visual layout of documents . Business Intelligence Suite A component responsible for consolidating the information from disparate sources allowing for simultaneous analysis of the entire information landscape. The TM task can be separated into two major categories according to their task and according to the algorithms and formal frameworks that they are using. The first is the Task-oriented preprocessing approaches that envision the process of creating a structured document representation in terms of tasks and sub-tasks and usually involve some sort of preparatory goal or problem that needs to be solved. The second is the preprocessing approaches that rely on techniques that derive from formal methods for analyzing complex
đang nạp các trang xem trước