The SpamFilter is a JSPWiki filter that can be used to block questionable edits. This filter is available since JSPWiki 2.1.117 as a CoreFilter. http://doc.jspwiki.org/2.4/wiki/SpamFilter contains the most up-to- date instructions.

Parameters#

wordlist
The name of the WikiPage on which the word list resides. Default is "SpamFilterWordList".
errorpage
The name of the page to which the user is redirected, if the edit contains a matched word. On that page, the variable [{$msg}] is available, telling the reason. Default is RejectedMessage.
blacklist
Name of the attachment that contains a blacklist, where each line is interpreted as a pattern to check against. Any lines starting with "#" are ignored as comments. (Since 2.3.98)

The word list#

The SpamFilter looks at the WikiVariable called 'spamwords' on the wordlist page. This must contain a space-separated list of words not allowed in a page. In fact, each word is a full Perl5 regular expression, so you can do pretty complex matches as well.

Of course, it is a good idea to allow only trusted users to edit the wordlist page. Otherwise a spammer can remove the list...

Example#

Put the following in your filters.xml file (See PageFilter Configuration for more information):

    <filter>
      <class>com.ecyrd.jspwiki.filters.SpamFilter</class>
    </filter>

to start the filter. Create a page called "SpamFilterWordList" and put the following on it:

[{SET spamwords='vaigra money'}]
to prevent anyone from saving a page that contains either the word "vaigra" or "money". In a bit more complicated example:
[{SET spamwords='[vV][aA][iI][gG][rR][aA]'}]
would block the words "vaigra", "Vaigra", "vAIGra" and so on.

(The word "vaigra" is misspelled on purpose, because otherwise it would be caught in the spam trap...)


Q. Would it be possible to remove the changes in SpamWordFilterList from the RecentChanges page and RSS feeds?
Not to be picky, but by including those pages you are doing "spamming by side-effect"... :-) -- NascifAbousalhNeto
Nascif, we might look into including an exclude parameter on the RecentChangesPlugin. You might submit that as an idea to keep it visible.

-- MurrayAltheim

Q. How do you set the Captcha which is available in 2.6. I have searched around and cannot see much to help with this.

--Elrond

A: The Captcha is fully automated.


Q. Why does the SpamFilter for this site reject gmaildotcom (with a"." instead of dot")? I can't even register an account with the site using my gmail email address.

-- JonHanson

A. Because gmail became a spammer haven after someone cracked their captcha. We got a few thousand bot registrations here one day... But it's enabled again now, since it seems that they've fixed the captcha.


There is a conceptual problem with SpamFilterWordList and the attached SpamFilterWordList/blacklist.txt(info):

Spamers can easily locate, download and analyze the blacklist definitions and then adjust their spamming strategies.

Based on what I can observe on my own JSPWiki site http://km-works.eu/mathel-wiki they indeed take this opportunity regulary.

Now I found a simple and effective solution for this problem: Rename the blacklist definition file to some nonsense name with an image mime-type, e.g. hjg451234gkl.jpg, which is of course wrong and misleading to JSPWiki.

The wrong mime-type effectively hides the attachment from being indexed and viewed from the wiki system, because JSPWiki cannot recognize it as a valid image file anymore (and lucene just wont index image files).

Now adjust your spam filter definition to reflect the new name for the blacklist, like:

    <filter>
      <class>com.ecyrd.jspwiki.filters.SpamFilter</class>
      <param>
         <name>blacklist</name>
         <value>hjg451234gkl.jpg</value>
      </param>

Note that, although the file extension is misleading, the SpamFilter plugin can still access the content of the blacklist file.

--ChristianLerch, 09-Apr-2011 08:18

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-22) was last changed on 09-Apr-2011 18:19 by 188.118.240.35