Thursday, March 8, 2007

Combating Spams on Mediawiki




One of the assassins that are proliferating over the internet are "spams". It does not only invade our regular emails but also websites. Many webmasters of any social networking sites or any community based websites have experienced that attack of a robot-generated or human spamming. Now, that wiki technology is getting popular as an open site, it is a good contender of an attack. Well, to combat spam or protect my wiki project, I have to come up with a solution.

After thorough research and testing all types of anti-spam mechanism, I have found "Completely Automated Public Turing test to tell Computers and Humans Apart" (CAPTCHA), a type of challenge-response test used in computing to determine whether the user is human. Now here's how apply it inside mediawiki.

Checking article text for spam

Since Wikipiniana is a text-based collaborative writing, insertion of unfiltered words (pornographic or offensive statements) is an avoidable circumstances wherein anybody who has nothing to do in this world can ruin your good project. By filtering possible words in your contents, it can ease your trouble in sanitizing contents.


a. $wgSpamRegex

- open your LocalSettings.php the add the following code:

$wgSpamRegex = "/".
"s-e-x|livesex|animalsex|". //These match spam words.
"dirare\.com|". //This matches a spammer's domain name
"overflow:\s*auto|". //This matches against overflow:auto
"height:\s*[0-4]px|". //This matches against height:0px (most CSS hidden spam)
"\<\s*a\s*href|". //This blocks
"display\s*:\s*none". //This matches against display:none
"/i"; //This ignores upper-lower case for letters.

(*note: populate $wgSpmRegex as desired)

- for Syntax reference pls see Regular_Expression
- Expected output: when you try to add an article entitled "livesex" or an article that contains the word "livesex", the wiki will not save the article

b. SpamBlacklist (will automatically load set of spam list from wiki; filters only text included in external link and follow direction)

* Download SpamBlacklist.php and SpamBlacklist_body.php from http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/SpamBlacklist/

* Save to /mywiki/extensions/SpamBlacklist/

* Open LocalSettings.php from /mywiki/ folder and add the ff. code:

require_once( "$IP/extensions/SpamBlacklist/ SpamBlacklist.php" );

c. $wgSpamBlacklistFiles (comes after SpamBlacklist [purpose : loads list of text filters as specified rather than using the set of spam automatically loaded from wiki])

* open LocalSettings.php from /mywiki/ folder.
* add the following code:

#right after require_once("$IP/extensions/SpamBlacklist/ SpamBlacklist.php" );

$wgSpamBlacklistFiles = array ( “specify URL of blacklist”);


#example of URL “$IP/extensions/SpamBlacklist/mywiki_blacklist.php

2. Identify if text input is from a human or a spam bot. (CAPTCHA images)

d. Download ConfirmEdit.php and ConfirmEdit.i18n.php (latest update) from http://svn.wikimedia.org/viewvc/mediawiki/trunk/ extensions/ConfirmEdit/ and save to the directory: /mywiki/extensions/ConfirmEdit. If ConfirmEdit folder does not exist create the folder.

e. (optional )Open ConfirmEdit.php and customize $wgCaptchaTriggers,

$ceAllowConfirmedEmail, and $wgCaptchaWhitelist as desired.

f. Open LocalSettings.php in /mywiki/ folder.

g. Add the following line in LocalSettings.php:

require_once( "$IP/extensions/ConfirmEdit/ConfirmEdit.php" );


3. To disable google from giving additional rank to spammer sites

h. Open DefaultSettings.php found under the folder /mywiki/includes
i. Find the $wgNoFollowLinks and change true to false

4. To secure proxy banning (***not advisable)

j. SORBS DNSBL (a support system)

(*note: CAPTHCA is a better alternative as discussed on http://meta.wikimedia.org/wiki/Proxy_blocking)

5. Also you must implement "User must be logged-in to perform edit"

k. Open LocalSettings.php in /filnetwiki/ folder

l. Add the following line in LocalSettings.php:

$wgGroupPermissions[‘*’][‘edit’] = false;
$wgShowIPinHeader = false;


Now, I have my peace of mind...


No comments: