Re: [Semediawiki-user] status and problems on sematicweb.org

Dear Yury, dear Lonny,

thanks for the hints. I am confident that we can soon upgrade/configure 
our software to block most new spam from coming through. We will follow 
your tips and see how it goes.

What remains to be done is to clean up the existing spam.

Yury: could you send me (off-list) some more details how to clean up the 
wiki using [6]? This could be very useful to minimize the remaining 
manual work.

Markus

On 13/01/12 08:06, Yury Katkov wrote:
> Hi everyone!
>
> Thanks for your attention, it's great to see that people really care
> about the subject.
>
> Based on own experience with my wikis and grandiose experince of
> Wikipedia's and Wikia's guys I can tell that MediaWiki has great tools
> for combating spam. These tools range from external links blacklist to
> machine learning spam and vandalism detectors on wikibots [1,2,3,7].
>
> 1) 10 hours waiting period is very effective against spambots whose
> typical behavior is to register and immediately write something
> 2) Non-standard sign up form with additional required field sometimes
> works: we have Semantic Signup and beta of Social Profile extensions for
> that.
> 3) add 'nofollow' attribute to all external links once worked extremely
> well on Wikipedia, but it maybe not a good idea in our case since
> semanticweb services, papers and projects can and have to be promoted
> with semanticweb.org <http://semanticweb.org> wiki.
> 4) this guy [4] coupled with SpamBlacklist [5] and this one [6] can help
> to clean up the wiki.
> 5) I haven't tried AbuseFilter yet but also heard that it's effective.
>
> User rights tuning can also help:
>
> 1) Allow blocking and maybe deleting privileges to real users. This may
> be a cause for more people to get involved.
> 2) There is no existing way to restrict the URL creation for various
> group of users (for example, deny inserting external links to anonymous
> users and users that haven't confirm their e-mail) but it doesn't seem
> that the that it's hard to write an extension for that.
>
>
> References:
>
> [1] http://www.mediawiki.org/wiki/Manual:Combating_spam
> [2] http://www.mediawiki.org/wiki/Anti-spam_features
> [3] http://www.mediawiki.org/wiki/Spam_Filter
> [4]
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SpamBlacklist/cleanup.php
> [5] http://www.mediawiki.org/wiki/Extension:SpamBlacklist
> [6] https://github.com/dannyob/secretaribot
> [7] http://help.wikia.com/wiki/Help:Spam
>
> Sincerely yours,
> Yury Katkov
>
>
>
>
> On Fri, Jan 13, 2012 at 6:26 AM, Lonny <lonny@appropedia.org
> <mailto:lonny@appropedia.org>> wrote:
>
>     Hi All,
>
>     Appropedia has found that the following steps have stopped the vast
>     majority of our previously incessant spam:
>     * requiring captcha for anons coupled
>     * a 1 edit, 10 hour waiting period for new users to post external links
>     * a few settings in AbuseFilter
>
>     Please let me know if you would like more details of our current
>     spam fighting setup.
>
>     Good luck,
>     Lonny
>
>     PS as you know, you can see the extensions at
>     http://www.appropedia.org/Special:Version
>
>     On Jan 12, 2012 11:36 AM, "Markus Krötzsch"
>     <markus.kroetzsch@cs.ox.ac.uk <mailto:markus.kroetzsch@cs.ox.ac.uk>>
>     wrote:
>
>         Hi Yuri,
>
>         let us take this to one mailing list semantic-web@w3.org
>         <mailto:semantic-web@w3.org>, as this is the
>         list that is most involved (please drop the others when you reply).
>
>         As the technical maintainer of the site, I largely agree with your
>         assessment. In spite of the very high visibility of the site (and
>         perceived authority), the active editing community is not big.
>         This is a
>         problem especially given the significant and continued spam
>         attacks that
>         the site is under due to its high visibility (I just recently
>         changed
>         the captcha system and rolled back thousands of edits, yet it
>         seems they
>         are already breaking through again, though in smaller numbers).
>
>         I do not want to blame anybody for the state of affairs: most of
>         us do
>         not have the time to contribute significant content to such sites.
>         However, given the extraordinary visibility of the site, we
>         should all
>         perceive this as a major problem (to the extent that we attach
>         our work
>         to the label "semantic web" in any way).
>
>         So what can be done?
>
>         (1) Freeze the wiki. A weaker version of this is: allow users
>         only to
>         edit after they were manually added to a group of trusted users (all
>         humans welcome). This would require somebody to manage these
>         permissions
>         but would allow existing projects/communities to continue to use
>         the site.
>
>         (2) Re-enforce spam protection on the wiki. Maybe this could be
>         done,
>         but the site is targeted pretty heavily. Standard captchas like
>         ReCaptcha are thus getting broken (spammers do have an effective
>         infrastructure for this), but maybe non-standard captchas could work
>         better. This is a task for the technical maintainers (i.e., me
>         and the
>         folks at AIFB Karlsruhe where the site is hosted).
>
>         (3) Clean the wiki. Whether frozen or not, there is a lot of spam
>         already. Something needs to be done to get rid of it. This requires
>         (easy but tedious) manual effort. Some stakeholders need to be
>         found to
>         provide basic workforce (e.g., by hiring a student to help with spam
>         deletion).
>
>         (4) Restore the wiki. Update the main pages (about technologies and
>         active projects) to reflect a current and/or timeless state that we
>         would like new readers to see. This again needs somebody to push
>         it, and
>         for writing pages about topics like SPARQL one would need some
>         expertise. This is a challenge for the community.
>
>         I am willing to invest /some/ time here to help with the above,
>         but (3)
>         and (4) requires support from more people. On the other hand,
>         there are
>         probably hardly more than 20 or 30 *essential* content pages
>         that we are
>         talking about here, plus many pages about projects and people
>         that one
>         should ask the stakeholders to review. So one might be able to
>         make this
>         into a shining entry point to the semantic web in a week of work ...
>         together with (1) and (2) above, the invested work would remain
>         valuable
>         for a long time.
>
>         Cheers
>
>         Markus
>
>
>
>         On 12/01/12 10:43, Yury Katkov wrote:
>          > Hi everyone!
>          >
>          > What is the current status of the semanticweb.org
>         <http://semanticweb.org>
>          > <http://semanticweb.org> website? It used to be the main wiki
>         about the
>          > semantic web, it has a lot of cool and useful information about
>          > everything. But now it seems abandoned. I mean, there are
>         about 30 real
>          > writers who update the information about their projects an write
>          > articles, but they do something like 30% of changes. The
>         other 70% is spam!
>          >
>          > Are there guys who support the website?
>          > Who manages the community, are there any plans of creating
>         projects and
>          > articles about SW? Is there community at all?
>          >
>          > In my opinion if this great website suppose to be alive the
>         first goal
>          > is to find volunteers who'll help administrator to combat
>         spam (with
>          > bots, extensions and editing policies) and support the new
>         activities
>          > and projets on the wiki. (I'm ready to be one of them).
>          > If this wiki lived only in the past when it was a big hype around
>          > Semantic Web topics and now without a big funding nobody
>         wants to use it
>          > - wouldn't it better to be frozen?
>          >
>          > I appreciate and admire people who started up the wiki.
>         Please, don't
>          > let it be the rotting memorial to the past of the Semantic Web.
>          > -----
>          > Sincerely yours,
>          > Yury Katkov, WikiVote llc
>          >
>          >
>
>

-- 
Dr. Markus Kroetzsch
Department of Computer Science, University of Oxford
Room 306, Parks Road, OX1 3QD Oxford, United Kingdom
+44 (0)1865 283529               http://korrekt.org/

Received on Friday, 13 January 2012 09:21:38 UTC