Re: why I don't like named graph IRIs in the DATASET proposal from Pat Hayes on 2011-10-05 (public-rdf-wg@w3.org from October 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 5 Oct 2011 12:46:39 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Eric Prud'hommeaux <eric@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <C481985F-B66D-45AE-9752-8B286959D72B@ihmc.us>
On Oct 2, 2011, at 9:25 AM, Richard Cyganiak wrote:

> Eric,
> 
> On 2 Oct 2011, at 14:15, Eric Prud'hommeaux wrote:
>>> It matters if Bob claims that all vegetarians are geniuses.
>>> 
>>> You'll have to carefully filter Bob's graph in order to remove anything of that sort before you can pass it to your inference engine. That's my point: you have to make trust decisions *before* you can do OWL reasoning.
>> 
>> Given:
>> I believe Bob to be an authority about whether he's a vegitarian.
>> I don't believe Bob to be an authority about whether he's a genius.
>> Bob (being a genius) can come up with many tricky ways to imply that he's a vegetarian or a genius.
>> 
>> I can sanitize Bobs homepage before inferencing to remove things like:
>> <Bob> a <Genius> .
>> <Bob> a <Foo> . <Foo> rdfs:subClassOf <Genius> .
>> <Bob> <bar> <Foo> . <bar> rdfs:subPropertyOf rdf:type . <Foo> rdfs:subClassOf <Genius> .
>> <Genius> owl:equivalentClass  [ a owl:Restriction ; owl:onProperty rdf:type ; owl:hasValue <Vegetarian> ] .
>> ...
>> The simplest tool to use is a reasoner which generates complete evidence for every inference.
> 
> That is neither the simplest tool, nor is it something that's in the OWL standard.
> 
>> I can then carefully attempt to remove the parts of Bob's homepage graph which led me to conclusions about <Genius>s without disturbing the ones that led me to conclusions about <Vegetarian>s. Given that this is a *hard* problem, I think my confidence goes sharply up if I let full inference happen on Bob's homepage but treat the inferential closure with the same predicated trust that I would his raw homepage.
> 
> You'd have to treat the inferential closure with the *lowest* trust that you'd give to *any* of the sources involved in creating the inference closure.
> 
> OWL makes it trivial to assert things that affect *everything*, like {rdf:type a owl:InverseFunctionalProperty.}.

But it also makes it easy to detect obviously false consequences of such very general errors. 

> 
> So all it takes is *one* stupid or malicious source, and *all* inference closures are polluted and become useless.

As long as you can trace the derivations, this need not be true. If you completely throw away all provenance nformation and just keep the derived inferences, then indeed you have a trust problem. So, what is new? This has always been true, but it doesn't seem to stop people using data. 

There are various things you can do to establish more confidence, if it worries you. You might for example want to filter things for obvious inconsistency with some known facts, for example by using an OWL reasoner to check for the inconsistencies. But you need to be doing inferences, whatever your trusting position is; and you want the actual inference machinery to be trustworthy (ie to be valid).

> 
> At that point you have to go back and do the trust filtering – exclude the source, or remove the malicious/broken statements.
> 
> Those who develop large-scale OWL reasoners for untrusted data often use some variation of “I don't trust anyone to make T-Box statements about IRIs they don't own”:
> http://sw.deri.org/~aidanh/docs/saor_billiontc08.pdf
> http://renaud.delbru.fr/doc/pub/rr2011-sindice.pdf
> 
> The point is that an inference closure that might have been generated using malicious or broken statements cannot be trusted at all, even if I know that some of the sources are perfectly trustworthy. This is different from, say, SPARQL, where data of different quality can coexist in the same RDF dataset without affecting each other.

But not without potentially affecting query results. If there is a bad assertion in some RDF, then this is liable to screw up a query result just like any other inference (querying is just inference done backwards.) So while your general point is correct, it doesnt really distinguish between RDF and OWL or between inference and SPARQL. 

OWL is more 'dangerous' than RDF simply because it is more expressive and supports more inferences. That is life: the more expressive the language, the more complicated the inferential pathways can be. FOL or Common Logic would be even more extreme. But again, its a matter of degree rather than of kind. 

Pat

> 
> To return to our main topic: This is why owl:sameAs on graph names should not change a dataset.
> 
> Best,
> Richard
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 5 October 2011 17:47:25 UTC