tailieunhanh - Data Preparation for Data Mining- P8

Data Preparation for Data Mining- P8: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | H M L Total T 6 0 0 6 A 3 8 3 14 S 0 0 6 6 Total 9 8 9 26 Figure Bivariate histogram showing the joint distributions of the categories for weight and height of the Canadiens. Notice that some of the categories overlap each other. It is these overlaps that allow an appropriate ordering for the categories to be discovered. In this example since the meaning of the labels is known the ordering may appear intuitive. However since the labels are arbitrary and applied meaningfully only for ease in the example they can be validly restated. Table shows the same information as in Table but with different labels and reordered. Is it now intuitively easy to see what the ordering should be TABLE Restated cross-tabulation. Please purchase PDF Split-Merge on to remove this watermark. A B C Total X 3 3 8 14 Y 0 6 0 6 Z 6 0 0 6 Total 9 9 8 26 Table contains exactly the same information as Table but has made intuitive ordering difficult or impossible. It is possible to use this information to reconstruct an appropriate ordering albeit not intuitively. For ease of understanding the previous labeling system is used although the actual labels used so long as consistently applied are not important to recovering an ordering. Restating the cross-tabulation of Table in a different form shows how this recovery begins. Table lists the number of players in each of the possible categories. TABLE Category count tabulation. Weight Height Count H T 6 H A 3 H S 0 M T 0 M A 8 M S 0 Please purchase PDF Split-Merge on to remove this watermark. L T 0 LA 3 L S 6 The information in Table represents a sort of jigsaw puzzle. Although in this example the categories in all of the tables are shown appropriately ordered to clarify explanation the real situation is that the ordering is unknown and that needs to be discovered. What is known are the various frequencies for each of the category couplings which are pairings here as

TỪ KHÓA LIÊN QUAN