23 June, 2005

Further anti-spam tips

Speaking of comments spam (yes, I was last week – keep up!), it's been a while since any reached the published blog (now that's tempting fate).  I'm still receiving quite a lot, but two refinements to my counter-measures seem to be working rather well.

Firstly, I'm more proactive with blacklisting.
Rather than waiting for spam then adding the enclosed URLs to MT-Blacklist, I've identified common words/phrases and blocked them in advance, using regular expressions. I've been careful to avoid combinations of letters which might be used justifiably (feel free to mention Middlesex), but various topics, particularly involving pornography, gambling and pharmaceuticals are unlikely to arise in genuine contexts here, so they seem safe to bar.
A side-effect has been to reduce the number of individually-blocked URLs by about a third.

A second approach is probably less well-known.
I've noticed that spammers (or their software) tend to target low-numbered entries in MT-based blogs (no. 250 will receive more spam than no. 850). Presumably, more blogs last as far as the 250th entry than reach 850, so automated spam engines are more likely to succeed by guessing comments-form URLs in the lower range.
I've put this to practical use by reserving one low-numbered entry to be kept on permanent 'draft' status, never to be published but with commenting open. This means a robot can hit the comments script directly, but humans and search engines won't see the result. Hence, 80-90% of spam evading the filters appears on that one entry, which is only visible to me. I can then blacklist the advertised sites or incorporate them into regular expressions at my leisure.

Incidentally, I'm using MT 2.661 – users of later versions might still find these ideas useful, if not their direct application.


MT-Blacklist 2.x for MT 3.x allows for URL patterns, so that key phrases can be blocked in URLs. That way, you can mention various pharmaceuticals in a comment, but not as a URL. You could do the same using regular expressions on older versions though.

Posted by Neil T. at June 23, 2005 08:17 PM

Unfortunately, that's not enough. Spammers use unrelated URLs, so one has to block the content of comments too.

Posted by NRT at June 24, 2005 01:04 AM
