Re: Poisonous models (was the bad word) from Hugh Glaser on 2010-07-19 (public-lod@w3.org from July 2010)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Mon, 19 Jul 2010 07:27:44 +0000
To: Daniël Bos <corani@gmail.com>
CC: Linked Data community <public-lod@w3.org>
Message-ID: <EMEW3|83f2707eb8e8663f9414453b59bae311m6I8SE02hg|ecs.soton.ac.uk|C869BD00.15DAF>
On 18/07/2010 18:02, "Daniël Bos" <corani@gmail.com> wrote:

> On the topic of "Everybody called 'Dave' is the same person", I could imagine
> dbpedia saying that all these Dave's are distinct (using e.g.
> owl:differentFrom) (they don't, but maybe they should!), which means if you
> accept dbpedia, you can't at the same time accept dave.rdf.
Sorry Daniel, I personally just can't imagine this.
Dbpedia (and every other site) would have to (confidently) assert
owl:differentFrom (or whatever) between every URI they issue and every other
URI to guard against this.
I could generate such triples for my site, but the explosion would be
strange to any consumers, and somehow I don't think the LOD world should be
predicated on me doing so.
And what is the negation of skos:exactMatch, and foaf:knows and any other
predicate a poisoner chooses to use?

One day we will have to deal with all this, and I hope the day is coming
fast, as it will be a measure of success.


> I'm pretty sure that big players will store their data in (at least) quads, to
> include the source of the triples. This means they can collect data from the
> internet, and at a later time decide about the trustworthiness of the source.
> I can imagine for example, that they will only accept sources that don't
> contradict. The more statements your model has that contradict the already
> collected body of statements, the less likely your model will be accepted.
So what about browsers that load rdf as you browse?
If I have browsed to this file, will my cache be OK?

And what about sites that seek to cache rdf?
Are they protecting against this, as Sindice seems to because it only tells
you about documents, and doesn't cache them?

I'm sure there are a bunch of papers I have missed on all this at ISWC etc.
(in fact I think I remember some), so sorry if this is revisiting.

Best
Hugh
> 
> It would still be indexed however, so arguably you could ask questions like
> "Which sources claim that 'Dave A.' is the same as 'Dave B.'?"
> 
> Daniel
> 
> On Sun, Jul 18, 2010 at 23:58, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
>> Sure, Nathan may be.
>> But Richard and Toby moved into the poisoning world.
>> You can only use the techniques you describe if you have concepts of where
>> things can/can't come from.
>> And as Toby says, if Google (or Sindice) took this...
>> What does happen if Sindice accepts this document?
>> 
>> Hugh
>> 
>> On 18 Jul 2010, at 05:54, "Daniël Bos"
>> <corani@gmail.com<mailto:corani@gmail.com>> wrote:
>> 
>> 
>> I think Nathan isn't talking about poisoning models (which could be prevented
>> using reification, or using quads, which include the source of the statement,
>> and then only trust selected statements), but about the problem of giving
>> spammers a tool to much easier collect email and postal addresses from the
>> web, by simply parsing pages instead of scraping and somehow detecting the
>> information.
>> 
>> Though I can see the danger in that, I personally don't think it is that much
>> of an issue, since email addresses have always been easy to scrape, and
>> postal addresses are in most cases easy to collect from e.g. business
>> directories. Semantic markup makes it easier, but those wanting to collect
>> this kind of data could and would do that anyway.
>> 
>> --
>> With kind regards,
>> Daniël Bos
>> 
>> On Jul 18, 2010 12:55 AM, "Hugh Glaser"
>> <<mailto:hg@ecs.soton.ac.uk>hg@ecs.soton.ac.uk<mailto:hg@ecs.soton.ac.uk>>
>> wrote:
>> 
>> You better hope your system can cope with this.
>> <http://data.totl.net/dave.rdf>http://data.totl.net/dave.rdf
>> 
>> Hugh
>> 
>> On 17 Jul 2010, at 11:35, "Nathan"
>> <<mailto:nathan@webr3.org>nathan@webr3.org<mailto:nathan@webr3.org>> wrote:
>> 
>>> So, after seeing this question on s...
> 
>
Received on Monday, 19 July 2010 07:28:53 UTC