The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain. This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks." We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities. Table of Contents: Automating Inquiry / Uncertainty Sampling / Searching Through the Hypothesis Space / Minimizing Expected Error and Variance / Exploiting Structure in Data / Theory / Practical Considerations
What people are saying - Write a review
We haven't found any reviews in the usual places.
active learning algorithm annotation costs approximation argmax argmin binary search Chapter Cited on page(s cluster computational conditional random fields Conference on Machine consider Dasgupta data instances decision boundary density-weighted DIS(Vt entropy entropy-based uncertainty sampling expected error reduction Figure Fisher information heuristics hypothesis class hypothesis space information extraction input distribution input space labeled instances learner log-loss logistic regression machine learning Machine Learning ICML margin McCallum minimize model class naive Bayes natural language processing neural network number of labeled ofthe International Conference optimal oracle output variance parameter pool-based posterior probability problem Proceedings ofthe International Processing Systems NIPS query by committee query instances query synthesis random sampling re-trained reduce region ofdisagreement result scenario selective sampling semi-supervised learning sequence stream-based supervised learning support vector machines SVMs text classification training data training set true label uncertainty sampling unlabeled data utility measure version space vote entropy