W3C home > Mailing lists > Public > public-lod@w3.org > July 2010

Poisonous models (was the bad word)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sun, 18 Jul 2010 15:58:57 +0000
To: Daniël Bos <corani@gmail.com>
CC: Linked Data community <public-lod@w3.org>
Message-ID: <EMEW3|3232aee62df15f36079972223cc465ddm6HGw402hg|ecs.soton.ac.uk|5543B928-0622-456A-A2ED-5F5EEDDA5491@ecs.soton.ac.uk>
Sure, Nathan may be.
But Richard and Toby moved into the poisoning world.
You can only use the techniques you describe if you have concepts of where things can/can't come from.
And as Toby says, if Google (or Sindice) took this...
What does happen if Sindice accepts this document?


On 18 Jul 2010, at 05:54, "Daniël Bos" <corani@gmail.com<mailto:corani@gmail.com>> wrote:

I think Nathan isn't talking about poisoning models (which could be prevented using reification, or using quads, which include the source of the statement, and then only trust selected statements), but about the problem of giving spammers a tool to much easier collect email and postal addresses from the web, by simply parsing pages instead of scraping and somehow detecting the information.

Though I can see the danger in that, I personally don't think it is that much of an issue, since email addresses have always been easy to scrape, and postal addresses are in most cases easy to collect from e.g. business directories. Semantic markup makes it easier, but those wanting to collect this kind of data could and would do that anyway.

With kind regards,
Daniël Bos

On Jul 18, 2010 12:55 AM, "Hugh Glaser" <<mailto:hg@ecs.soton.ac.uk>hg@ecs.soton.ac.uk<mailto:hg@ecs.soton.ac.uk>> wrote:

You better hope your system can cope with this.


On 17 Jul 2010, at 11:35, "Nathan" <<mailto:nathan@webr3.org>nathan@webr3.org<mailto:nathan@webr3.org>> wrote:

> So, after seeing this question on s...
Received on Sunday, 18 July 2010 15:58:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:07 UTC