Re: Poisonous models (was the bad word) from Hugh Glaser on 2010-07-19 (public-lod@w3.org from July 2010)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Mon, 19 Jul 2010 07:22:18 +0000
To: Renaud Delbru <renaud.delbru@deri.org>
CC: Daniël Bos <corani@gmail.com>, Linked Data community <public-lod@w3.org>
Message-ID: <EMEW3|8300cb646ef34c0a19a0ec45ec299210m6I8Ml02hg|ecs.soton.ac.uk|C869BBBA.15DAD>

Thanks Renaud,
Very helpful and thoughtful.
A couple of comments.


On 18/07/2010 17:49, "Renaud Delbru" <renaud.delbru@deri.org> wrote:

> Hi Hugh,
> 
> to answer to your question, Sindice will accept the document, perform
> reasoning and index it as it is. However, Sindice is somehow robust to
> this kind of "poisonous" data. Sindice is performing a particular kind
> of reasoning that we call "context-dependent" reasoning [1], in which
> inference is performed in the "context of the document". The inference
> will only be true in the context of this document, and will not have a
> global impact, i.e., will not alter the inference on other documents.
> Therefore, Sindice avoids undesirable assertions. In fact, we do not
> restrict the freedom of expression of data publishers as in other
> approach like SAOR [2] where certain statements are considered invalid
> and ignored.  Data publishers are allowed to reuse and extend ontologies
> or existing entities in any manner, but the consequences of their
> modifications will be confined in their own context, and will not alter
> the intended semantics of the other RDF models published on the Web.
Cool.
Sounds really good that the inference part of Sindice is robust to this.
Although I guess if I use Sindice to find relevant documents for
dbpedia:Darby_Riordan and load them into my store, I am likely to end up
with a pretty poisonned store.
> 
> However, if somebody requests all documents stating <?s, owl:sameas,
> dbpedia:Darby_Riordan>, Sindice will return you the document
> http://data.totl.net/dave.rdf. But such problem can be tackled with
> appropriate ranking methodologies (based on link analysis methods such
> as [3]).
> Poisonous documents published on the web are likely to not have
> any incoming links (or only from other poisonous documents, but this can
> be detected), and therefore will be ranked very low and will never
> appear in the top-k search results.
Not sure of this.
Poisonous documents may well have many links to them (saying they are
poisonous?).
This seems to me to be comparable to the citation problem, where a paper
gets very high citations because everyone cites it as being wrong.
Of course, sentiment analysis etc may help (and may be easier in the
semantic web), but pure reference count is dangerous.
> 
> [1] http://renaud.delbru.fr/doc/pub/SSWS2008-context.pdf
> [2] http://www.deri.ie/fileadmin/documents/DERI-TR-2009-04-21.pdf
> [3] http://renaud.delbru.fr/doc/pub/eswc2010-ding.pdf
> 
> Regards,

Received on Monday, 19 July 2010 07:23:20 UTC