Is this deja vu all over again? Didn’t we see a spam article here yesterday? Well, the weather is still too crappy for me to have any good horseplay stories (and besides, Arthur probably would have been out of commission with a slight limp that seems to be responding well to bute). So I flipped my geek topic coin (heads: Google; tails: Spam) and here we are. Seriously, today’s New York Times had an interesting article by James Gleick that I wish I’d read before I wrote yesterday’s contribution. Some comments on his story would have fit neatly into mine.
Gleick does a pretty good job of explaining the spam issue to those who may be unfamiliar with it. (I started to say that probably doesn’t include any Times readers, but then I thought of my parents, although I’m not sure they still subscribe). Most of Gleick’s article is old news to anyone with more than a few months of online experience, but he presents the issue in an interesting way and raises some good points. Unfortunately, the article is weak in a couple of areas (I wonder what he got paid for it; maybe I’m in the wrong business.
Gleick and I seem to agree on a lot, especially the conclusion that a legislative solution is necessary. He cites a Supreme Court decision which would seem to confirm our assertion that such legislation would not conflict with the First Amendment. While the solutions he proposes are not quite the same as the centralized “no-spam” list that I advocate, his suggestions are feasible and I would have no problem supporting them.
He discusses filtering at length, and mentions all the problems with it that others have cited. He shares my belief that “false positives” (legitimate mail that gets discarded as spam) are more troubling than “false negatives” (spam that slips through the filters). I was amused by the example he chose:
If your filter deletes what it thinks is spam, you may never see the message from your long-lost high-school sweetheart, who finally wants to make contact but uses too many exclamation points, or calls you a dear friend, or mentions sex.
His hypothetical example is actually very similar to a spam message I received recently. It would indeed be an interesting task for a filter to discriminate between real and spam versions of this scenario.
In Gleick’s discussion of filters, he expounds on Spam Assassin, which really is more notable for its popularity than for anything else. He briefly mentions Paul Graham’s new Bayesian approach, but fails to mention the best implementation of it, Bill Yerazunis’ CRM114, which Graham himself says “has the best spam filtering performance I’ve heard of to date and deserves to be better known.” (Gleick has probably never seen Dr. Strangelove).
In another incomplete treatment of his subject, Gleick mentions the problem of forged headers. While he’s correct that adding forged headers to a message is trivially easy, he neglects to mention that the legitimate headers added by each SMTP server processing the message can not be avoided by the spammer, and provide a very good tool for tracking the source of spam. And he doesn’t bother to mention my favorite spam fighting tool, SpamCop, which does a very good job of reading the headers, discarding the forged ones, and generating abuse reports to the responsible authorities for the sites indicated by the authentic headers.
Gleick also (somewhat) correctly says
Although headers will be forged, inspecting the HTML source of junk mail usually reveals a genuine domain name, because, after all, you are meant to visit some Web site. Domain names are required to list publicly a technical contact and an administrative contact.
Again, he fails to mention that SpamCop does exactly what he suggests. In addition gathering information from the headers, it also parses the message for web and email addresses, and offers to send reports to the appropriate contacts for those addresses. (This is an option that needs to be used cautiously, as spam will sometimes contain your own address. SpamCop will dutifully report to your ISP that your address was used in a spam message if you blindly check all the boxes in its report).
Overall, it’s good to see the spam issue dealt with in such depth in a medium with the reach of the Times. It’s a little disappointing that Gleick didn’t spend a few more minutes on research to make it more accurate. But since he agrees with me on the major principles, I guess I shouldn’t bash him too much.