« Recent OMW Notices | Main | The Return of the Disclaimer »

January 19, 2005

Comment Spam

Google has announced that it and other blogging software companies will be implementing a new html attribute to reduce comment spam; this new attribute -- rel="nofollow" -- will keep Google's spiders from following urls left by people who comment on someone's blog entry, thus reducing the motivation for spammers to leave comment spam for Google page rank purposes. I don't know that this makes a difference for this site, since I've disabled HTML in comment text areas anyway (I figure you all know how to cut and paste a URL), but if it gets spam comments down overall, I think that's groovy.

This is some interest to me because recently I've noticed an upsurge in comment spam activity here -- I've been having to clear out close to 100 posts a day. The good news is that it's pretty darn easy to do in Movable Type 3.11; I cleared out about 70 this morning in three minutes. But of course, it's still annoying, and there is the unfortunate side effect that while clicking the little boxes to remove comment spam, I occasionally and accidentally remove a legitimate comment, too. I hate it when that happens. I could make my life even more easy by implementing the MT Blacklist functionality, but that involves installing things, and I can already hear my database screaming at the thought of me tinkering with it.

If I were to make a wishlist of things I'd like for Movable Type to implement to make it easier for me to combat comment spam, here's what I would wish for (and if you know these things exist as add-ons or part of the native MT functionality, please let me know):

1. The ability to delete comments from the actual comment thread, as opposed to having to fire up the MT backend to get at it. Interestingly enough, AOL Journals user have this functionality -- they see buttons to delete comments right there as they read; the functionality is keyed to their screenname so no one else can delete anything, of course. Could see MT doing something similar using cookies on a specific browser or through some sort of sign-on implementation.

2. The ability to semi-moderate: I'd love to be able to let messages without HTML coding go through but sequester off html-laden comments until I approved them. This would mean general conversation would continue, since very few "real" commenters here reference URLs, but comment spam would be blocked from showing up at all in the threads; I'd throw them out before they got there.

3. The ability to ban commenters not just by IP (which is pretty useless these days if you're not running MT blacklist) but also by commenter name. I doubt any real person is name "Phenteramine" or "Online Poker." This would be a temporary stopgap, of course, as spammers would pick up on it fairly quickly. but what would be reasonably effective is the ability to ban by phrase: That is, have the MT scan through the text and if a specific sequence of words pops up, either block it or drop it into a moderation queue for approval. Since those "phrases" could include URLs which would be constant over many many comment spams, this could be very helpful.

If MT were to implement any of these, it would make my online life easier. Implementing all of them, of course, would make it a joyous skip through the park.

Update: As it happens, Six Apart (who make MT) have recently put out a guide to comment spam which notes a useful plug-in for quasi-moderating: MT-Moderate, which automatically puts comments as "pending" if they're attached to entries past a certain age (the default is seven days), on the (largely correct in my experience) theory that older entries aren't likely to get actual comments, they're likely to get spam (the plugin also notes when a comment has been approved for an older entry and backs off a bit on moderating that particular entry for a day or two to let real-time conversation happen -- a nice touch.)

I've gone ahead an added MT-Moderate, so if you decide to comment on an entry that's more than a week old, be aware that there may be a time lag before it shows up, since I'll need to approve it. But the flip side for me is that comment spam will largely be gone from the site. I love it when a plan comes together.

Posted by john at January 19, 2005 10:39 AM

Trackback Pings

TrackBack URL for this entry:
http://www.scalzi.com/mt2/mt-tb.cgi/2992

Comments

Paul | January 19, 2005 11:35 AM

MT-Blacklist can be used to ban commenters by e-mail address, if you use a @ sign in the blacklist entry. For example,

@[\w\-_.]cialis[\w\-_.]*\.[a-z]{2,}

will ban any posts from people with "cialis" in the domain part of their e-mail address.

At least that's how MT-Blacklist works under MT 2.6.x; I haven't tried it with MT 3.11 yet.

iJames | January 19, 2005 11:47 AM

Comment spam concerns are one of the major reasons I decided to go with Wordpress a month or so back. Your second and third requests are part of the base software, and the first one is possible in two steps. (Drop a plugin into a directory, click "Activate" from the options menu.)

I've also got a plugin called Spam Karma that implements the same blacklist as MT-Blacklist, with a few other checks on top of it, and enables false positives to lift themselves out of the moderation queue. If a legitimate comment is flagged as spam, the commenter can either type in a code from one of those funky distorted-letter images, *or* supply their e-mail address and respond to a validation message. This leaves very few comments for me to moderate.

I'm not trying to evangelize; I know the futility of that. But if MT developers looked at some of the stuff Wordpress is currently doing, they might get some good ideas.

tobias buckell | January 19, 2005 12:08 PM

It's sweet, I'm tinkering now with the same things...

Sue | January 19, 2005 12:12 PM

I heart MT Blacklist. I got hit by something like 55 spam comments yesterday and it's so quick and easy to delete them. I've had close to 800 spam deleted automatically or moderated in the two months I've been using it. LOVE!

Brian Greenberg | January 19, 2005 01:13 PM

I'm just curious: by setting the rel="nofollow" attribute, does Google ignore the link associated with each commenter's name? I don't have a blog (heck - *someone* has to not have a blog these days), but I do have a website where I occasionally write blog-like entries, and I think the link to my site (from my name below) on the Whatever comment pages are a significant portion of my Page Rank.

Second point: I don't use Movable Type, but I can tell you what I'd like to see them do that will eliminate most if not all comment spam in a single stroke: add the Ticketmaster strategy of showing a graphic and having the commentor type in the word displayed in the graphic. It would be easy for them to do (pick a random file from a folder of images, compare what was typed to a lookup table entry for each file), it would be a few more keystrokes from each commentor (not too bad, given how much some of us are apparently willing to type anyway), and very difficult for spammers to defeat.

So, who do I talk to about this?!? ;-)

Kate Nepveu | January 19, 2005 01:22 PM

Image-based human tests eliminate those who use screen readers because they are visually impaired.

John Scalzi | January 19, 2005 01:28 PM

"Image-based human tests eliminate those who use screen readers because they are visually impaired."

Yup. And I know of at least one Whatever reader who uses a screen reader.

I actually don't mind deleting the comment spam I get -- it's usually not an onerous task. I just don't want it to show up on the site in the first place. So for me, having date-based moderation isn't a bad compromise.

Bill Peschel | January 19, 2005 02:45 PM

I turned off commenting on my site for the very reasons you cited, John. The combination of pmachine's method of deleting comments (onerious) and my dial-up service made canning it the only viable option if I didn't want to spend an hour a day deleting that krep.

I've considered switching to MT, but it looked far more difficult to install and tinker with than pmachine. But I'm glad to see they're doing something about comment spam, and the "older than 7 days" check is a very good idea.

iJames | January 19, 2005 03:39 PM

Brian Greenberg:
"I occasionally write blog-like entries, and I think the link to my site (from my name below) on the Whatever comment pages are a significant portion of my Page Rank."

No, because Whatever's using a redirector. If you look at those links, they're actually links back to a CGI program here on Scalzi's site. The CGI then sends the browser over to your site, and Google never knows about you. The explicit purpose of this is so that you won't get a page rank boost just by commenting.

(A purpose I heartily support, BTW. While I'd love to partake of Scalzi's mighty mojo, and that of other blogs I like throughout the net, I'm happiest earning it the old-fashioned way: by being so damn clever that people are compelled to check out my site and plug it in their own blog content. Someday, my sweet. Oh yes. Someday it will ALL BE MINE!) **cartoon evil laughter**

Megan | January 19, 2005 04:02 PM

I found some of MT's newer things to be a bit on the buggy side. I have no interest in evangelizing either--if I was that passionate about my blogging toys I probably would've tinkered with MT a while longer--but I do recommend being able to roll back to something you know works. But I suppose that's just good advice for life in general.

Brian Greenberg | January 19, 2005 04:22 PM

James:
"The CGI then sends the browser over to your site, and Google never knows about you. The explicit purpose of this is so that you won't get a page rank boost just from commenting."

Even better...that means I earned it all by myself! :-)