C4.5: Programs for Machine LearningClassifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses. |
Contents
| 1 | |
| 17 | |
CHAPTER 3 Unknown Attribute Values | 27 |
CHAPTER 4 Pruning Decision Trees | 35 |
CHAPTER 5 From Trees to Rules | 45 |
CHAPTER 6 Windowing | 57 |
CHAPTER 7 Grouping Attribute Values | 63 |
CHAPTER 8 Interacting with Classification Models | 71 |
CHAPTER 9 Guide to Using the System | 81 |
CHAPTER 10 Limitations | 95 |
CHAPTER 11 Desirable Additions | 103 |
Program Listings | 109 |
| 297 | |
| 299 | |
Common terms and phrases
algorithms Argv AttName attribute values AttValName BestAtt BestClass BestVal bits Boolean branch BrSubset Bytes CaseDesc CFLAGS cfree Chapter class distribution Class Item ClassFreq classification model classifier ClassName ClassSum compensated hypothyroid Cond constructed continuous attribute cross-validation CVal Item decision tree default DefaultClass define Deleted Desc description space discrete attribute DiscrValue Donald Michie encode Epsilon error rate false filestem float FocusClass fopen ForEach Freq gain ratio getc hypothyroid include defns.i include extern.i include types.i Info ItemCount KnownItems leaf LogFact Machine Learning MaxAttVal MaxClass MaxItem MaxVal misclassified mx missile NCond node NRules option outcome partition Pat Langley predicted error printf probability production rule PRSet Quicksort Quinlan RangeDesc Ross Quinlan RStreamIn char Ruleln RuleNo ruleset sizeof(float Slice2 split strcat strlen subset subtree T->Branch T->Forks TargetClass threshold thyroid surgery thyroxine training set unknown unseen ValFreq value groups Verbosity wage increase Weight


