Đang chuẩn bị liên kết để tải về tài liệu:
Data Preparation for Data Mining- P7

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Data Preparation for Data Mining- P7: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | While the na ve one-of-n remapping one state to one variable may cause difficulties domain knowledge can indicate very useful remappings that significantly enhance the information content in alpha variables. Since these depend on domain knowledge they are necessarily situation specific. However useful remappings for state may include such features as creating a pseudo-variable for North one for South another for East one for West and perhaps others for other features of interest such as population density or number of cities in the state. This m-of-n remapping is an advantage if either of two conditions is met. First if the total number of additional variables is less than the number of labels then m-of-n remapping increases dimensionality less than one-of-n potentially a big advantage. Second if the m-of-n remapping actually adds useful information either in fact by explicating domain knowledge or by making existing information more accessible once again this is an advantage over one-of-n. This useful remapping technique has more than one of the pseudo-variables on for a single input. In one-of-n one state switched on one variable. In m-of-n several variables may be on. For instance a densely populated U.S. state in the northeast activates several of the pseudo-variables. The pseudo-variables for North East and Dense Population would be on. So for this example one input label maps to three on input pseudo-variables. There could of course be many more than three possible inputs. In general m would be on of the possible n so it s called an m-of-n mapping. Another example of this remapping technique usefully groups common characteristics. Such character aggregation codings can be very useful. For instance instead of listing the entire content of a grocery store s produce section using individual alpha labels in a na ve one-of-n coding it may be better to create m-of-n pseudo-variables for Fruit Vegetable Root Crop Leafy Short Shelf Life and so on. Naturally the .