Machine Learning for Email: Spam Filtering and Priority Inbox

Front Cover
"O'Reilly Media, Inc.", Oct 25, 2011 - Computers - 148 pages

If you’re an experienced programmer willing to crunch data, this concise guide will show you how to use machine learning to work with email. You’ll learn how to write algorithms that automatically sort and redirect email based on statistical patterns. Authors Drew Conway and John Myles White approach the process in a practical fashion, using a case-study driven approach rather than a traditional math-heavy presentation.

This book also includes a short tutorial on using the popular R language to manipulate and analyze data. You’ll get clear examples for analyzing sample data and writing machine learning programs with R.

  • Mine email content with R functions, using a collection of sample files
  • Analyze the data and use the results to write a Bayesian spam classifier
  • Rank email by importance, using factors such as thread activity
  • Use your email ranking analysis to write a priority inbox program
  • Test your classifier and priority inbox with a separate email sample set
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

Chapter 1 Using R
1
Chapter 2 Data Exploration
29
Spam Filtering
75
Priority Inbox
95
Works Cited
129
Copyright

Other editions - View all

Common terms and phrases

About the author (2011)

Drew Conway is a PhD candidate in Politics at NYU. He studies international relations, conflict, and terrorism using the tools of mathematics, statistics, and computer science in an attempt to gain a deeper understanding of these phenomena. His academic curiosity is informed by his years as an analyst in the U.S. intelligence and defense communities.

John Myles White is a Ph.D. student in the Princeton Psychology Department, where he studies how humans make decisions both theoretically and experimentally. Outside of academia, John has been heavily involved in the data science movement, which has pushed for an open source software approach to data analysis. He is also the leadmaintainer for several popular R packages, including ProjectTemplate and log4r.

Bibliographic information