CRM114
Spam Filters, Alien Technology and Ruby on Rails
By Arto on Wed, 2006-07-05 22:19. aliens | CRM114 | Ferret | Lisp | Rails | Ruby | Scheme | spam | Strangelove
When Paul Graham’s A Plan for Spam made its dramatic entrance into the anti-spam battle four years ago, it heralded the beginning of the end for spam — as we knew spam back then, anyway. Applying a simple statistical approach, based on word frequency analysis with a naive Bayesian classifier, Graham described how to create a spam filter accurate enough (99+%) that false positives effectively ceased to be an issue.
The central idea of the Graham Algorithm was quickly adopted en masse by spam filters, and as a result, the spam arms race has in the past few years tipped in favor of the good guys. “Successful” spam has devolved into exactly what Graham predicted it would: “some completely neutral text followed by a URL.” For me personally, the combination of good server- and client-side filters has made spam yesterday’s problem. (Well, that, and using Gmail as a front-end for lower-priority e-mail; spam all you want, it’s Google’s problem and they’re up to the task.)
Recently at $WORK, we’ve succeeded in applying similar text classification principles to another unrelated problem area, with the intent of forcing the computer to do the tedious job it was invented for, allowing us super-apes, in turn, to spend more time under a palm tree on the nearby beach, sipping tinto de verano and working out answers to deep existential questions, or whatever else it is that one does on the beach (note to self: need more practice).
