A comparison of genetic algorithms and other machine learning systems of a complex classification task from common disease research
Abstract: "The thesis project is an investigation of some well- known machine learning systems and evaluates their utility when applied to a classification task from the field of human genetics. This common- disease research task, an inquiry into genetic and biochemical factors and their association with a family history of coronary artery disease (CAD), is more complex than many pursued in machine learning research, due to interactions and the inherent noise in the dataset. The task also differs from most pursued in machine learning research because there is a desire to explain the dataset with a small number of rules, even at the expense of accuracy, so that they will be more accessible to medical researchers who are unaccustomed to dealing with disjunctive explanations of data. Furthermore, there is assymetry in the task in that good explanations of the positive examples is of more importance than good explanations of the negative examples. The primary machine learning approach investigated in this research is genetic algorithms (GA's); decision trees, Autoclass, and Cobweb are also included. The GA performed the best in terms of descriptive ability with the common-disease research task, although decision trees also demonstrated certain strengths. Autoclass and Cobweb were recognized from the onset as being inappropriate for the needs of common-disease researchers (because both systems are unsupervised learners that create probabilistic structures), but were included for their interest in the machine learning community; these systems did not perform as well as GA's and decision trees in terms of their ability to describe the data. In terms of predictive accuracy, all systems performed poorly, and the differences between any two of the three best systems is not significant. When positive and negative examples are considered separately, the GA does significantly better than the other systems in predicting positive examples and significantly worse in predicting negative examples. The thesis illustrates that the investigation of 'real' problems from researchers in other fields can lead machine learning researchers to challenge their systems in ways they may not otherwise have considered, and may lead these researchers to a symbiotic relationship that benefits multiple research communities."
12 pages matching CHOL in this book
Results 1-3 of 12
What people are saying - Write a review
We haven't found any reviews in the usual places.
CHALLENGES FOR MACHINE LEARNING
DESIGNING NEW METRICS FOR THE EVALUATION
THE GENETIC ALGORITHMS IMPLEMENTATION
3 other sections not shown
APOAIV 11 APOE 33 APOE not 32 APOH 22 apolipoprotein approach attribute-value pairs Autoclass average average rule best rule sets best string binary Chapter CHOL CIII class attribute classification task Cobweb common-disease research complex confidence intervals crossover and mutation crossover rates dataset decision tree systems described examples correct correct experiments family history fitness function five best rules ft ft ft GA's genes genetic algorithms genotypes GID3 graphs H H H hillclimbing history of CAD history of disease illustrated in Figure immigration interactions machine learning research machine learning systems metric mutation rates negative examples number of examples number of number number of positive number of rules odds ratio paired t-test paradigm possible values prediction accuracy prediction task raw fitness risk factors rN rN rN rule set found rules that explain SMOKE specific string represents thesis training set traits unsupervised learning variation