Book review: Ending Spam—Bayesian Content Filtering and the Art of Statistical Language Classification

Book review: Ending Spam—Bayesian Content Filtering and the Art of Statistical Language Classification


For a lot of people, thoughts about spam are limited to a burst of bad language and perhaps a brief marvel at the sheer volume of organisations that want to help fix aspects of other people’s genitalia. However, there is more to spam than expletives. Spam doesn’t just magically appear in your mailbox, it has a history and so does the battle against it. There are some pretty interesting and innovative weapons available to combat the evil that is spam. And some of those weapons are examined in Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification

The book’s coverThe book’s cover

This book, published by No Starch Press, is an overview of spam and how it can be combated effectively using statistical filtering. Jonathan Zdziarski is extremely well qualified to take on the job of explaining such things; he is maintains the spam filter DSPAM and has had lots of exposure in the press for his long fight against spam.

You finish the book feeling good that there are people like Zdziarski fighting the good fight

The contents

Ending Spam weighs in at an unimposing two hundred and eighty seven pages, so it’s the sort of book you can carry around comfortably. It is divided into three parts, and these parts are divided into chapters. In Part 1 “An Introduction to Spam Filtering” Zdziarski runs through the history of spam (including examples), the historical approaches to fighting spam, and introduces language classification concepts and the fundamentals of statistical filtering. This part gives the reader a nice background in anti spam terminology and concepts.

In Part 2 “Fundamentals of Statistical Filtering” a bit more detail is given to concepts that were introduced in Part 1 (specifically tokenization and decoding), and environmental and space considerations for people considering using statistical filtering is discussed. It also contains the very entertaining chapter 7—“The Low-Down Dirty Tricks of Spammers”—which is great because it gives an extra feeling of relevance and balance to the information being imparted.

Part 3 is “Advanced Concepts of Statistical Filtering”. The chapters included in this, as may be evident by the title, further build upon the concepts developed in Part 2. Part 3 doesn’t just talk about how great statistical filtering is: it also talks about some elements of weakness in specific models and ways these can be combated. It is also includes an appendix, which introduces five “shining examples of filters” and interviews with their creators. This gives the reader a chance to see statistical filtering in action and backs up the discussions in Parts 2 and 3.

Who’s this book for?

This book is for people with an interest in what’s going on in the world of spam filtering; it certainly isn’t limited to people within a specialised field. It is excellently written and very accessable. If you are currently setting up or designing your own spam filter you are hopefully aware of almost everything covered in the book. However, if you want an introduction to or a brush-up on the current world of spam filtering or if want the gain ability to speak confidently about it to people in the know, then this book to read.

Relevance to free software

Zdziarski’s goal is to erradicate the scourge of spam from the face of the earth. To do this, he wants to reach as many people as possible, so he doesn’t catagorise his audience or create boundaries for them. The filters he discusses in the appendix are all free software and he cites them as some of the finest examples of spam filters available. But, I do think his promotion of them as free software is more for pragmatic reasons than as philosophical or political ones and therefore more likely to have widespread appeal. This book is for everyone, free and proprietory users alike.

Pros

People should buy this book because it’s excellently written and fun to read. It is also an intelligent read, and you come out of the experience feeling good that there are people like Zdziarski fighting the good fight. You might even feel more positive the next time you get spam because you know that it’s the side that is slowly losing.

Cons

It’s a really detailed overview, and that’s pretty much what it is—an overview. It has some probability models which are great, but if you want some programmer’s delight packed full of code, this isn’t it. If it was, I wouldn’t have made it past the first page.

Title Ending Spam
Author Johnathan A. Zdziarski
Publisher No Starch Press
ISBN 1593270526
Year 2005
Pages 287
CD included No
FS Oriented 8/10
Over all score 9/10

In short

Category: 
License: 

Author information

Bridget Kulakauskas's picture

Biography

Bridget has a degree in Sociology and English and a keen interest in the social implications of technology. She has two websites: Illiterarty and The Top 10 Everything. She also handles accounts and administration for Free Software Magazine.