Re: why I don't like named graph IRIs in the DATASET proposal

Eric,

On 2 Oct 2011, at 14:15, Eric Prud'hommeaux wrote:
>> It matters if Bob claims that all vegetarians are geniuses.
>> 
>> You'll have to carefully filter Bob's graph in order to remove anything of that sort before you can pass it to your inference engine. That's my point: you have to make trust decisions *before* you can do OWL reasoning.
> 
> Given:
>  I believe Bob to be an authority about whether he's a vegitarian.
>  I don't believe Bob to be an authority about whether he's a genius.
>  Bob (being a genius) can come up with many tricky ways to imply that he's a vegetarian or a genius.
> 
> I can sanitize Bobs homepage before inferencing to remove things like:
>  <Bob> a <Genius> .
>  <Bob> a <Foo> . <Foo> rdfs:subClassOf <Genius> .
>  <Bob> <bar> <Foo> . <bar> rdfs:subPropertyOf rdf:type . <Foo> rdfs:subClassOf <Genius> .
>  <Genius> owl:equivalentClass  [ a owl:Restriction ; owl:onProperty rdf:type ; owl:hasValue <Vegetarian> ] .
>  ...
> The simplest tool to use is a reasoner which generates complete evidence for every inference.

That is neither the simplest tool, nor is it something that's in the OWL standard.

> I can then carefully attempt to remove the parts of Bob's homepage graph which led me to conclusions about <Genius>s without disturbing the ones that led me to conclusions about <Vegetarian>s. Given that this is a *hard* problem, I think my confidence goes sharply up if I let full inference happen on Bob's homepage but treat the inferential closure with the same predicated trust that I would his raw homepage.

You'd have to treat the inferential closure with the *lowest* trust that you'd give to *any* of the sources involved in creating the inference closure.

OWL makes it trivial to assert things that affect *everything*, like {rdf:type a owl:InverseFunctionalProperty.}.

So all it takes is *one* stupid or malicious source, and *all* inference closures are polluted and become useless.

At that point you have to go back and do the trust filtering – exclude the source, or remove the malicious/broken statements.

Those who develop large-scale OWL reasoners for untrusted data often use some variation of “I don't trust anyone to make T-Box statements about IRIs they don't own”:
http://sw.deri.org/~aidanh/docs/saor_billiontc08.pdf
http://renaud.delbru.fr/doc/pub/rr2011-sindice.pdf

The point is that an inference closure that might have been generated using malicious or broken statements cannot be trusted at all, even if I know that some of the sources are perfectly trustworthy. This is different from, say, SPARQL, where data of different quality can coexist in the same RDF dataset without affecting each other.

To return to our main topic: This is why owl:sameAs on graph names should not change a dataset.

Best,
Richard

Received on Sunday, 2 October 2011 14:26:23 UTC