InterWikiSoftware RecentChanges

SpamClean

Difference between revision 44 and current revision

No diff available.

spamclean.py

I've written a preliminary version of a script to detect and revert spam on a remote wiki.

Here's how it works. The user types in something like this:

    spamclean.py 'http://interwiki.sourceforge.net/cgi-bin/wiki.pl'

This means, "I want you to scan the wiki at the given URL and alert me to any recently posted spam".

The program first downloads a BannedContent list from CommunityWiki (the banned content list source can be changed with command-line options). Next, it fetches RecentChanges from the target wiki. Then, it look at each of the recently changes pages, one-by-one ("recent" can be defined via a command-line option; default is one week).

For each page, if the page sets off the spam detector, we step back along previous revisions until we hit the first spamless revision (at least, spamless in the eyes of the spam detector). Note that if there are legitimate pages in the wiki which set off the spam detector, then this strategy won't work; the program won't be able to find a spamless revision and so won't do anything about that page except alert the user.

If we find BannedContent, the user is alerted and asked if we should revert the change. If the user says yes, it reverts the page to the last revision without BannedContent.

A command-line option can be used to tell the program to auto-revert content without asking the user.

WHICH ENGINES ARE SUPPORTED?

This script doesn't do any wiki-engine specific processing itself; it relies upon WikiGateway. Therefore, it'll work with whatever WikiGateway works with.

At the moment (Oct 3 2004), only UseMod is supported to the extent needed by this script, but as WikiGateway is expanded to support more wiki engines, this script will work with them too (without being modified). (I did UseMod first because I personally have a bunch of UseMod wikis to police for spam).

HOW TO GET IT

The script is part of the WikiGateway distribution. Unfortunately, WikiGateway isn't incredibly easy to install yet; there's some work to be done in terms of documentation and packaging (I plan to make some .debs and .rpms).

In order for this script to work, you must install both the latest version of the Wiki::Gateway Perl module (available on CPAN; must be version 0.00143 or better), and the WikiGateway.py Python module. But it's not that bad, so go ahead and download it if you're interested. I'm willing to help you to get it installed.

spamclean.py can be found in the "apps" directory of the WikiGateway .tar.gz, versions 0.00143 and up (link).

Brief installation instructions (I haven't "tested" these instructions, btw):

-- BayleShanks

Note for Debian users (may help others too): you'll have to also install the package python-egenix-mxdatetime; by the way, SOAPpy is available in Debian package python-soappy. For non Debian users, the mx.DateTime? module can be found at http://www.lemburg.com/files/python/eGenix-mx-Extensions.html. AndrewGray


Other features


Proposed features


Problems

SpamClean is only a stopgap solution, not a substitute for spam protection at the wiki engine level. It is the belief of the author of SpamClean that wiki engines should consider protection from spam as a core feature.

For example, right now, even if you revert spam, RecentChanges can becomes filled with spam reverts. This can clobber older, interesting changes to the underlying pages (but see the proposed "anti-clobber" feature). Having the spam never posted in the first place, like with OddMuse's BannedContent feature, would be much better.


See also AntiSpamBot page on Chongqed Wiki, CommunityWiki:BannedContentBot.

CategoryWikiGateway