W3C home > Mailing lists > Public > www-validator@w3.org > January 2015

Re: High level of queries about validation, whatsapp and bb code.

From: Michael[tm] Smith <mike@w3.org>
Date: Fri, 9 Jan 2015 02:20:25 +0900
To: David Dorward <david@dorward.me.uk>
Cc: www-validator@w3.org
Message-ID: <20150108172025.GF15289@jay.w3.org>
> Date: Wed, 07 Jan 2015 13:29:23 +0000
> From: David Dorward <david@dorward.me.uk>
> Archived-At: <http://www.w3.org/mid/72A45749-4EF2-46BA-9518-3ECBD52F4C05@dorward.me.uk>
> 
> Over the last few months there has been a pattern of questions of the
> mailing list which boil down to "Here are some links mentioning
> whatsapp, I'm using BB Code, the HTML is invalid, what should I do?".
> 
> The pattern is looking rather spammy so I searched for the URLs on
> Google and found them turning up in queries to bug trackers and in
> comment spam.
> 
> Should they be filtered out of incoming email and purged from the
> list archives?

The W3C mailing-list system already uses a spam-filtering mechanism that
catches a huge amount of spam before it gets to W3C mailing lists. Right
now looking at the list-management UI, I see 1103 spam messages it's
blocked recently from reaching the list. So it's working well already.

Unfortunately that spam filter doesn't catch everything -- especially not
messages like the ones we've been seeing that appear to be crafted to look
similar to normal messages that we get on the list.

Anyway, even if we used a mechanism for augmenting that spam filter with
something additional for manually specifying single blacklist keywords
(which we don't), I think it'd be a losing battle because the spammers just
adjust by changing their words.

Certainly I can say in this case that simple blocking based on "whatsapp"
or "BBCode" wouldn't solve the problem on this list, because a number of
the spam/clickbait messages the got through didn't contain those words.

Anyway, we do at least have is a mechanism the W3C team can access to mark
copies of the messages as spam in the online archive for the list & remove
any links to them from elsewhere in the archive (e.g., index pages); ex:

  http://lists.w3.org/Archives/Public/www-validator/2015Jan/0012.html

I've done that for the clickbait messages that were sent to the list over
the last few weeks -

  http://lists.w3.org/Archives/Public/www-validator/2015Jan/thread.html
  http://lists.w3.org/Archives/Public/www-validator/2014Dec/thread.html

If you see any remaining there that I missed, let me know.

  --Mike

-- 
Michael[tm] Smith https://people.w3.org/mike/status

Received on Thursday, 8 January 2015 17:20:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:12 UTC