Re: why I don't like named graph IRIs in the DATASET proposal from Eric Prud'hommeaux on 2011-10-01 (public-rdf-wg@w3.org from October 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sat, 1 Oct 2011 19:29:25 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>, "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-ID: <20111001232923.GD20425@w3.org>
* Sandro Hawke <sandro@w3.org> [2011-10-01 17:59-0400]
> On Sat, 2011-10-01 at 12:43 +0100, Richard Cyganiak wrote:
> > On 30 Sep 2011, at 20:11, Sandro Hawke wrote:
> > >> Named graphs are key to trust and
> > >> provenance. Trust and provenance must happen at a lower level in the
> > >> stack, before reasoning and inference kick in. W3C's version of the
> > >> layer cake, where trust sits above reasoning, cannot work. The moment
> > >> you reason with OWL over untrusted data, you [have problems].
> > > 
> > > I don't think we need to throw out reasoning on the fourth column.  As
> > > long as we're careful about what it means -- eg: it denotes an IR which
> > > may give you a Graph -- I think people are free to layer inference and
> > > trust/provenance reasoning in various ways.  
> > > 
> > > Let's say you are using three Web data sources, S1, S2, and S3.  S1 and
> > > S2 give just triples.  S3 is an ontology (perhaps a RIF document); we
> > > don't really care if it's triples.   What's the problem with merging the
> > > triples, doing the inference, and using the result, knowing it is no
> > > more trustworthy than the least of S1, S2, and S3?  
> > 
> > Well, the way I see it, what happened here is that the system (on behalf of some user, I presume) decided that S1, S2 and S3 are good enough – sufficiently trustworthy – for the task at hand.
> > 
> > Provenance information is the basis for trust decisions. The system made the trust decision before it merged the graphs.
> > 
> > > Specifically, the
> > > provenance of your output involves the provenance of S1, S2, S3, and the
> > > reasoning steps you took. In detailing those reasoning steps, I think the identifiers for S1, S2, and S3 will be useful.
> > 
> > Sure. What I said was that you can't do OWL reasoning over untrusted data sources. I didn't say that you can't use graph names when recording processing steps that were taken.
> > 
> > > But for a later-stage provenance system to reason about S1, S2, and S3
> > > is fine, I think.
> > 
> > I don't know what it means when you say “a provenance system reasons about XYZ”. I suppose you're not talking about OWL reasoning.
> 
> There's an example of what I mean by reasoning about provenance and the
> fourth column identifier:
> 
>         So let's say we we have a concept of a semantic web home page
>         for a person. We decide on the policy that if someone's home
>         page says that they are a vegetarian, then we believe that they
>         are a vegetarian.
> 
> That's from [1], where it's shown in N3:
>         
>         @forAll :x.
>         {:x :homePage log:includes { :x a :Vegetarian }}=> { :x a :Vegetarian}.
> 
> (I think there might be a typo there; I can't quite parse the way
> :homePage and log:includes sit together.  It looks like DanC got it
> running later, slightly modified [2].)

>         {:x :homePage log:includes { :x a :Vegetarian }}=> { :x a :Vegetarian}.

Just to put this modified form in peoples faces:
[[
  @forAll WHO.
  { WHO foaf:homepage ?PG.
   ?PG log:semantics [ log:includes { WHO a Vegetarian } ]
  } => { WHO a Vegetarian}.
]]
where log:semantics maps from a gtext to gsnap and log:includes maps to all subgraphs.


> It's not immediately obvious to me how to do this kind of stuff with
> named graphs, but maybe it will come to me.  Perhaps we can show it
> with SPARQL?  Query for all the people whose home pages say they are
> vegetarians?

Yeah, that's how predicated trust is generally done. We can follow the example literally:
[[
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  CONSTRUCT { ?who a <Vegetarian> }
    WHERE {
      ?who foaf:homepage ?pg
      GRAPH ?pg { ?who a <Vegetarian> }
    }
]] and get
{
  <Bob> a <Vegetarian> .
}
or we can cut to the chase and ask for all the vegetarians by changing "CONSTRUCT { … } to "SELECT ?who".

Perhaps more illustrative is a more general truth maintenance operation over data like:
  @prefix foaf: <http://xmlns.com/foaf/0.1/> .
  {
    <Bob> foaf:homepage <bobzpage> .
    <Vegetarian> a <no-reason-to-lie-property> .
    <Genius> a <every-reason-to-lie-property> .
  }
  <bobzpage> {
    <Bob> a <Vegetarian> .
    <Bob> a <Genius> .
  }
sparql -d asdf.trig -e 'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { ?who a <Vegetarian> }
  WHERE {
    ?who foaf:homepage ?pg .
    ?selfClass a <no-reason-to-lie-property> .
    GRAPH ?pg { ?who a ?selfClass }
  }'
yields the statements you trust Bob to make about himself:
  {
    <Bob> a <Vegetarian> .
  }

As to doing inference over "untrusted data", (A) I think that *all* trust is predicated, and (B), it really doesn't matter if Bob claims to be a <Genius> or he claims to be a <Super-genius> and inferencing leads me to discover that he believes himself a <Genius>, my trust of the inferential closure is pretty much identical to my trust of his homepage. I believe the only exception to this is when you don't entirely trust the ontology 'cause it's got some ragged edges (as happens with large OWL-ified medical ontologies like SNOMED-CT).

>    -- Sandro
> 
> [1] http://www.w3.org/2000/10/swap/doc/Reach 
> [2] 
> http://dev.w3.org/cvsweb/~checkout~/2000/10/swap/test/reason/conf_reg_ex.n3?rev=1.1;content-type=text%2Fplain
> 
> 

-- 
-ericP
Received on Saturday, 1 October 2011 23:29:56 UTC