W3C home > Mailing lists > Public > www-validator@w3.org > January 2015

RE: High level of queries about validation, whatsapp and bb code.

From: Mark Rogers <mark.rogers@powermapper.com>
Date: Thu, 8 Jan 2015 12:08:33 -0600
To: "Michael[tm] Smith" <mike@w3.org>, David Dorward <david@dorward.me.uk>
CC: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <1F68EA0E0CBFBE44A9A64274E1AC01A122F798C256@DFW1MBX23.mex07a.mlsrvr.com>
I think the main reason people are doing this is to get links for SEO purposes from http://lists.w3.org/Archives/. A link from a site with high PageRank like w3.org boosts the PageRank of the target page. 

There should be less spam if the archive software added rel="nofollow" to all links in the list archives (hopefully it's a config option somewhere in the mailing list software). This makes the list a much less tempting target for link spammers, since Google and other search engines ignore nofollow links when calculating page rank. Some more info here:
http://en.wikipedia.org/wiki/Nofollow

Best Regards
Mark

Mark Rogers - mark.rogers@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 


-----Original Message-----
From: Michael[tm] Smith [mailto:mike@w3.org] 
Sent: 08 January 2015 17:20
To: David Dorward
Cc: www-validator@w3.org
Subject: Re: High level of queries about validation, whatsapp and bb code.

> Date: Wed, 07 Jan 2015 13:29:23 +0000
> From: David Dorward <david@dorward.me.uk>
> Archived-At: 
> <http://www.w3.org/mid/72A45749-4EF2-46BA-9518-3ECBD52F4C05@dorward.me
> .uk>
> 
> Over the last few months there has been a pattern of questions of the 
> mailing list which boil down to "Here are some links mentioning 
> whatsapp, I'm using BB Code, the HTML is invalid, what should I do?".
> 
> The pattern is looking rather spammy so I searched for the URLs on 
> Google and found them turning up in queries to bug trackers and in 
> comment spam.
> 
> Should they be filtered out of incoming email and purged from the list 
> archives?

The W3C mailing-list system already uses a spam-filtering mechanism that catches a huge amount of spam before it gets to W3C mailing lists. Right now looking at the list-management UI, I see 1103 spam messages it's blocked recently from reaching the list. So it's working well already.

Unfortunately that spam filter doesn't catch everything -- especially not messages like the ones we've been seeing that appear to be crafted to look similar to normal messages that we get on the list.

Anyway, even if we used a mechanism for augmenting that spam filter with something additional for manually specifying single blacklist keywords (which we don't), I think it'd be a losing battle because the spammers just adjust by changing their words.

Certainly I can say in this case that simple blocking based on "whatsapp"
or "BBCode" wouldn't solve the problem on this list, because a number of the spam/clickbait messages the got through didn't contain those words.

Anyway, we do at least have is a mechanism the W3C team can access to mark copies of the messages as spam in the online archive for the list & remove any links to them from elsewhere in the archive (e.g., index pages); ex:

  http://lists.w3.org/Archives/Public/www-validator/2015Jan/0012.html

I've done that for the clickbait messages that were sent to the list over the last few weeks -

  http://lists.w3.org/Archives/Public/www-validator/2015Jan/thread.html
  http://lists.w3.org/Archives/Public/www-validator/2014Dec/thread.html

If you see any remaining there that I missed, let me know.

  --Mike

--
Michael[tm] Smith https://people.w3.org/mike/status
Received on Thursday, 8 January 2015 18:09:42 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:12 UTC