Đang chuẩn bị liên kết để tải về tài liệu:
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Transform categorical attribute into asymmetric binary variables Introduce a new “item” for each distinct attribute-value pair Example: replace Browser Type attribute with Browser Type = Internet Explorer Browser Type = Mozilla Browser Type = Mozilla | Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining Continuous and Categorical Attributes Example of Association Rule: {Number of Pages [5,10) (Browser=Mozilla)} {Buy = No} How to apply association analysis formulation to non-asymmetric binary variables? Handling Categorical Attributes Transform categorical attribute into asymmetric binary variables Introduce a new “item” for each distinct attribute-value pair Example: replace Browser Type attribute with Browser Type = Internet Explorer Browser Type = Mozilla Browser Type = Mozilla Handling Categorical Attributes Potential Issues What if attribute has many possible values Example: attribute country has more than 200 possible values Many of the attribute values may have very low support Potential solution: Aggregate the low-support attribute values What if distribution of attribute . | Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining Continuous and Categorical Attributes Example of Association Rule: {Number of Pages [5,10) (Browser=Mozilla)} {Buy = No} How to apply association analysis formulation to non-asymmetric binary variables? Handling Categorical Attributes Transform categorical attribute into asymmetric binary variables Introduce a new “item” for each distinct attribute-value pair Example: replace Browser Type attribute with Browser Type = Internet Explorer Browser Type = Mozilla Browser Type = Mozilla Handling Categorical Attributes Potential Issues What if attribute has many possible values Example: attribute country has more than 200 possible values Many of the attribute values may have very low support Potential solution: Aggregate the low-support attribute values What if distribution of attribute values is highly skewed Example: 95% of the visitors have Buy = No Most of the items will be associated with (Buy=No) item Potential solution: drop the highly frequent items Handling Continuous Attributes Different kinds of rules: Age [21,35) Salary [70k,120k) Buy Salary [70k,120k) Buy Age: =28, =4 Different methods: Discretization-based Statistics-based Non-discretization based minApriori Handling Continuous Attributes Use discretization Unsupervised: Equal-width binning Equal-depth binning Clustering Supervised: Class v1 v2 v3 v4 v5 v6 v7 v8 v9 Anomalous 0 0 20 10 20 0 0 0 0 Normal 150 100 0 0 0 100 100 150 100 bin1 bin3 bin2 Attribute values, v Discretization Issues Size of the discretized intervals affect support & confidence If intervals too small may not have enough support If intervals too large may not have enough confidence Potential solution: use all possible intervals {Refund = No, (Income = $51,250)} {Cheat = No} {Refund = No, (60K Income 80K)} {Cheat = No} .