Ending spam: Bayesian content filtering and the art of statistical language classification

Front Cover
No Starch Press, 2005 - Computers - 287 pages
1 Review
Join author John Zdziarski for a look inside the brilliant minds that have conceived clever new ways to fight spam in all its nefarious forms. This landmark title describes, in-depth, how statistical filtering is being used by next-generation spam filters to identify and filter unwanted messages, how spam filtering works and how language classification and machine learning combine to produce remarkably accurate spam filters. After reading Ending Spam, you'll have a complete understanding of the mathematical approaches used by today's spam filters as well as decoding, tokenization, various algorithms (including Bayesian analysis and Markovian discrimination) and the benefits of using open-source solutions to end spam. Zdziarski interviewed creators of many of the best spam filters and has included their insights in this revealing examination of the anti-spam crusade. If you're a programmer designing a new spam filter, a network admin implementing a spam-filtering solution, or just someone who's curious about how spam filters work and the tactics spammers use to evade them, Ending Spamwill serve as an informative analysis of the war against spammers. TOC Introduction PART I: An Introduction to Spam Filtering Chapter 1: The History of Spam Chapter 2: Historical Approaches to Fighting Spam Chapter 3: Language Classification Concepts Chapter 4: Statistical Filtering Fundamentals PART II: Fundamentals of Statistical Filtering Chapter 5: Decoding: Uncombobulating Messages Chapter 6: Tokenization: The Building Blocks of Spam Chapter 7: The Low-Down Dirty Tricks of Spammers Chapter 8: Data Storage for a Zillion Records Chapter 9: Scaling in Large Environments PART III: Advanced Concepts of Statistical Filtering Chapter 10: Testing Theory Chapter 11: Concept Identification: Advanced Tokenization Chapter 12: Fifth-Order Markovian Discrimination Chapter 13: Intelligent Feature Set Reduction Chapter 14: Collaborative Algorithms Appendix: Shining Examples of Filtering Index

From inside the book

What people are saying - Write a review

Review: Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification

User Review  - James Luedke - Goodreads

Well written covers advanced concepts of statistical filtering in depth. A good read for software developers or sys admins who deal with spam. Read full review

Contents

War Waged on Spam
17
Final Thoughts
23
Historical Approaches to Fighting Spam
25
Copyright

36 other sections not shown

Common terms and phrases

About the author (2005)

Jonathan Zdziarski is better known as the hacker "NerveGas" in the iPhone development community. His work in cracking the iPhone helped lead the effort to port the first open source applications, and his book, iPhone Open Application Development, taught developers how to write applications for the popular device long before Apple introduced its own SDK. Prior to the release of iPhone Forensics, Jonathan wrote and supported an iPhone forensics manual distributed exclusively to law enforcement. Jonathan frequently consults law enforcement agencies and assists forensic examiners in their investigations. He teaches an iPhone forensics workshop in his spare time to train forensic examiners and corporate security personnel. Jonathan is also a full-time research scientist specializing in machine learning technology to combat online fraud and spam, an effort that led him to develop networking products capable of learning how to protect customers. He is founder of the DSPAM project, a high-profile, next-generation spam filter that was acquired in 2006 by Sensory Networks, Inc. He lectures widely on the topic of spam and is a foremost researcher in the fields of machine-learning and algorithmic theory. Jonathan's website is http://zdziarski.com.