Data Mining and Knowledge Discovery with Evolutionary Algorithms

Front Cover
Springer Science & Business Media, Aug 21, 2002 - Computers - 264 pages
2 Reviews
This book addresses the integration of two areas of computer science, namely data mining and evolutionary algorithms. Both these areas have become increas ingly popular in the last few years, and their integration is currently an area of active research. In essence, data mining consists of extracting valid, comprehensible, and in teresting knowledge from data. Data mining is actually an interdisciplinary field, since there are many kinds of methods that can be used to extract knowledge from data. Arguably, data mining mainly uses methods from machine learning (a branch of artificial intelligence) and statistics (including statistical pattern recog nition). Our discussion of data mining and evolutionary algorithms is primarily based on machine learning concepts and principles. In particular, in this book we emphasize the importance of discovering comprehensible, interesting knowledge, which the user can potentially use to make intelligent decisions. In a nutshell, the motivation for applying evolutionary algorithms to data mining is that evolutionary algorithms are robust search methods which perform a global search in the space of candidate solutions (rules or another form of knowl edge representation). In contrast, most rule induction methods perform a local, greedy search in the space of candidate rules. Intuitively, the global search of evolutionary algorithms can discover interesting rules and patterns that would be missed by the greedy search.
  

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

1 Introduction
1
111 Desirable Properties of Discovered Knowledge
2
12 Knowledge Representation
3
121 Prediction Rules
4
13 An Overview of Data Mining Paradigms
5
131 Rule Induction
7
132 Evolutionary Algorithms
9
References
10
7 Genetic Programming for Rule Discovery
139
72 Booleanizing All Terminals
140
721 Methods for Booleanizing Attributes
142
722 Examples of GP Systems Booleanizing Terminals
145
73 ConstrainedSyntax and StronglyTyped GP
146
74 GrammarBased GP for Rule Discovery
150
75 GP for DecisionTree Building
153
751 Evolving Decision Trees with FirstOrderLogic Conditions
154

2 Data Mining Tasks and Concepts
13
211 The Nondeterminism of the Classification Task
17
213 Estimating Accuracy via Training and Testing
19
22 Dependence Modeling
22
221 Dependence Modeling vs AssociationRule Discovery
23
23 The Challenge of Measuring PredictionRule Quality
24
232 Measuring Rule Comprehensibility
26
233 Measuring Rule Interestingness
27
24 Clustering
31
241 Hierarchical Agglomerative Clustering
33
242 The KMeans Algorithm
34
243 The Challenge of Measuring Clustering Quality
35
25 Inductive Bias
36
251 The DomainDependent Effectiveness of a Bias
37
252 The Simplicity Bias of Occams Razor
38
253 The Minimum Description Length MDL Principle
39
References
40
3 Data Mining Paradigms
45
311 DecisionTree Building
47
312 DecisionTree Pruning
49
313 Pros and Cons of DecisionTree Algorithms
50
314 The Inductive Bias of DecisionTree Algorithms
52
315 Linear Oblique Decision Trees
53
32 Rule Induction Algorithms
54
321 The Pitfall of Attribute Interaction for Greedy Algorithms
55
33 InstanceBased Learning Nearest Neighbor Algorithms
58
References
60
4 Data Preparation
65
411 The Motivation for Attribute Selection
66
413 Attribute Selection as a Particular Case of Attribute Weighting
71
421 An Overview of Discretization Methods
72
422 The Pros and Cons of Discretization
73
43 Attribute Construction
74
431 Categorizing Attribute Construction Methods
75
References
76
5 Basic Concepts of Evolutionary Algorithms
79
511 Key Issues in the Design of EAs
81
52 Selection Methods
83
53 Genetic Algorithms GA
84
532 Crossover Operators
86
533 Mutation Operators
88
54 Genetic Programming GP
89
541 Function Set
90
542 Crossover Operators
92
543 Mutation Operators
95
544 The Standard GP Approach for Classification
99
545 Code Bloat
100
55 Niching
101
References
103
6 Genetic Algorithms for Rule Discovery
107
612 Encoding a Rule Antecedent
110
613 Encoding a Rule Consequent
116
621 GeneralizingSpecializing Crossover
119
622 GeneralizingSpecializing Mutation
122
623 Condition RemovalInsertion
123
624 Rule InsertionRemoval
124
63 TaskSpecific Population Initialization and Seeding
126
64 TaskSpecific RuleSelection Methods
127
65 Fitness Evaluation
129
References
134
752 Evolving Linear Decision Trees
155
753 Hybrid GP Cellular Automaton and Simulated Annealing
157
76 On the Quality of Rules Discovered by GP
158
References
161
8 Evolutionary Algorithms for Clustering
165
811 Individual Representation
166
812 Genetic Operators
167
82 CentroidMedoidBased Individual Representation
169
822 MedoidBased Individual Representation
171
83 InstanceBased Individual Representation
172
832 GraphBased Individual Representation
173
84 Fitness Evaluation
174
841 Coping with Degenerate Partitions
176
85 EAs vs Conventional Clustering Techniques
177
9 Evolutionary Algorithms for Data Preparation
179
911 Individual Encoding and Genetic Operators
180
912 Fitness Evaluation
182
913 Attribute Selection via Search for Partially Defined Instances
187
915 Attribute Selection for Clustering
189
916 Discussion
190
92 EAs for Attribute Weighting
191
921 Attribute Weighting
192
922 Towards Constructive Induction
193
93 Combining Attribute Selection and Attribute Weighting
194
94 GP for Attribute Construction
196
941 Constructing Combinations of Boolean Attributes
197
942 Discovering Specialized Functions
199
and Construction with a Hybrid GAGP
200
References
201
10 Evolutionary Algorithms for Discovering Fuzzy Rules
205
1012 Operations on Fuzzy Sets
207
1013 Membership Functions
209
102 Fuzzy Prediction Rules vs Crisp Prediction Rules
213
103 A Simple Taxonomy of EAs for FuzzyRule Discovery
215
1041 Individual Encoding
216
1042 Determining the Degree of Matching Between a Fuzzy Rule Antecedent and a Data Instance
220
1043 Using Fuzzy Rules to Classify a Data Instance
221
1044 Specifying the Shape and the Number of Membership Functions
222
105 Using EAs for Tuning Membership Functions
224
106 Using EAs for Both Generating Fuzzy Rules and Tuning Membership Functions
225
107 Fuzzy Fitness Evaluation
228
References
230
11 Scaling up Evolutionary Algorithms for Large Data Sets
233
1112 Adaptive TrainingSubset Selection
235
112 An Overview of Parallel Processing
237
1122 Load Balancing
239
1123 Data Parallelism vs Control Parallelism
240
1124 Speed up and Efficiency Measures
242
113 Parallel EAs for Data Mining
243
1131 Exploiting Control Parallelism
246
1132 Exploiting Data Parallelism
248
1133 Exploiting Hybrid ControlDataParallelism
250
References
253
12 Conclusions and Research Directions
255
1212 On Knowledge Comprehensibility
256
1221 Developing EAs for Data Preparation
257
1222 Developing Multiobjective EAs for Data Mining
258
References
260
Index
263
Copyright

Common terms and phrases

References to this book

All Book Search results »

Bibliographic information