Re: Signalling entailment in queries from Sandro Hawke on 2010-07-20 (public-rdf-dawg@w3.org from July to September 2010)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 20 Jul 2010 16:28:59 -0400
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <1279657739.14023.349.camel@waldron>
On Tue, 2010-07-20 at 20:12 +0100, Birte Glimm wrote:
> On 20 July 2010 19:22, Sandro Hawke <sandro@w3.org> wrote:
> > In discussing the graph naming issue and the related service description
> > issue today, I agreed to take my concern/proposal to e-mail
> > [ACTION-282].
> >
> > The basic problem I have is this: I think people will be very confused
> > if SPARQL end points start to quietly do inference.  This confusion will
> > result in software doing the wrong thing, with possibly serious results.
> > Some of the blame will, rightly, land on "SPARQL inference".
> > Alternatively, for fear of this, public end points will just not do
> > inference.
> 
> I can see that just writing it in the SD might not be sufficient to
> avoid confusion. How do endpoints communicate this currently? There
> are already systems that do RDFS. Is there confusion or are RDFS
> inference systems just not made available via public endpoints? Since
> SD is also a new feature, I would guess that one might also say in the
> interface of a system that it will use RDFS inferences, so that users
> are made aware of this.

I don't really know what folks are doing now, but I don't think there's
any interop or major public deployment yet.

I notice that 4sreasoner (which I was coincidentally using for unrelated
reasons this afternoon) puts its inferences all into a GRAPH named
"http://4sreasoner.ecs.soton.ac.uk/entailedgraph/" 

I've seen other systems that put the justification (proof) there.  I
don't remember how it worked, but, for example, you could construct a
URI which encoded the derivation proof steps, and then have another
(unmaterialized) graph which provided detailed information about those
graphs by (on demand) decoding those URIs.   I think that kind of thing
is pretty much mandatory for a large class of use cases (where you're
accepting lots of sources, but tracking provenance). 

> > What we've got so far is the idea that folks should look at the service
> > description.   I think this is probably too "quiet".  As Souri pointed
> > out today, what happens if you looked at the service description at 5pm
> > and inference gets turned on at 6pm?  Yes, we can probably find
> > workarounds -- like saying you should use a different end point address
> > any time you make this kind of change to the SD -- but I'm thinking it's
> > better to just add something to the query language.    Listening to
> > folks in the meeting talk about it, I thought of this:
> >
> >        SELECT * FROM NAMED <g1>
> >        WHERE { GRAPH <g1> ENTAILS { ?x rdfs:subClassOf ?y } }
> 
> In general, I like the idea of having a query language keyword. This
> suggestion seems, however, suitable for the case of named graphs and
> it is not clear which kind of inference will be performed (RDF, RDFS,
> D-Entailment, ...). I could also imagine something like
> 
> SELECT * FROM NAMED <g1>
> USING <http://www.w3.org/ns/entailment/RDFS>
> WHERE { GRAPH <g1> { ?x rdfs:subClassOf ?y } }
> 
> or shorter with prefix with a query for the default graph:
> 
> PREFIX ent: <http://www.w3.org/ns/entailment/>
> SELECT * USING ent:RDFS WHERE { ?x rdfs:subClassOf ?y }

Those are nice.   I wonder if we can come up with a common syntax for
both the name-graph and default-graph case.    I suppose this could
work:

        SELECT * FROM NAMED <g1>
        WHERE { GRAPH <g1> USING ent:RDFS { ?x rdfs:subClassOf ?y } }

We lose the ability to have unparameterized entailment, where you have
to look at the SD, but maybe that's okay.  

Maybe instead of 
        USING ent-regime-id
it could be
        USING [INFERENCE] [ent-regime-id]
where you have to provide at least one of those optional bits

> However, this introduces a new keyword into the language and it is
> exactly what was decided to be out of scope (same for your suggestion
> of course).

I wasn't part of the WG at scoping time, but Ivan tells me it was only
the parameterization that was judged out of scope at the time, not the
adding of a keyword.   That's another reason to have an unparameterized
version.  :-)

> > Here, the system making the query is explicitly asking for inference, so
> > no harm can be done by the end-point suddenly turning on inference, as
> > long as it still allows querying of the pre-inference graph.  If the
> > end-point doesn't want to keep the pre-inference graph around, then
> > users will have to modify their queries to include the ENTAILS
> > keyword, ... but that seems right, since those users will need to think
> > about whether the change in results is okay for their application.
> 
> I just want to point out here that there does not have to be an
> inference graph. This is just one way of implementing it. You could
> rewrite your queries and evaluate more complex queries over the
> original graph, e.g., you could rewrite queries using property paths
> to implement the RDFS ent. reg. and you would not need a (partial)
> closure, but just support for path expressions. I am not sure whether
> all RDFS can be done with the currently proposed property paths, but I
> can modify query evaluation instead of materialising triples. I think
> this is important and although building the partial closure according
> to some rules, this is and should not be the only possible
> implementation technique.

I spent years programming in prolog, so the idea that a graph has to be
materialized to be referred to and used doesn't even occur to me.
Whether queries against an inferred graph are performed via query
rewriting or backward chaining or forward chaining (materialization)
or ... something else (does a hyper tableaux reasoner count as something
else?), doesn't matter.      But, as mentioned, for some ERs there isn't
a single "inference graph", so the concept is perhaps best avoided.

> > In this design, the kind of entailment would be specified in the SD.
> >
> > It's tempting to also allow parameterization of the entailment regime,
> > perhaps like this:
> >
> >    PREFIX ent <http://www.w3.org/ns/entailment/>
> >        SELECT * FROM NAMED <g1>
> >        WHERE { GRAPH <g1> ENTAILS BY ent:RDFS { ?x
> >        rdfs:subClassOf ?y } }
> 
> I think this is an abuse of prefix that we shouldn't allow. Prefix is
> a directive for the parser to enable the expansion of abbreviaed IRIs
> into fully qualified ones. I would not want to overload this with a
> specification for entailments, which has nothing to do with parsing or
> IRI expansion. I do like a keyword, but it would have to be one that
> is not yet used in SPARQL I think.

I think you're misunderstanding me here.   I'm using PREFIX exactly as
you say -- as a shorthand allowing abbreviation of IRIs.  I just happen
to be using it for abbreviating the IRI of an entailment regime, just
like you did in an example above. If you don't use PREFIX, it looks like
this: 
        
        SELECT * FROM NAMED <g1>
        WHERE { GRAPH <g1> ENTAILS BY
        <http://www.w3.org/ns/entailment/RDFS> { ?x
        rdfs:subClassOf ?y } }

  -- Sandro

> > I understand this parameterization is similar to the out-of-scope
> > "Parameterized Inference" feature [1], but perhaps if it's really as
> > simple as this, it's okay to do anyway.   (If not, is there a way to
> > suggest that everyone implement it the same way, even if it's not in the
> > spec?  :-)
> 
> I would be happy for either (having a keyword despite it being out of
> scope or an informal agreement). With an informal agreement it is,
> however, not forbidden to do inferences no matter whether the chosen
> keyword has been used or not.
> 
> 
> Birte
> 
> 
> > Finally, I wanted to thank folks for reminding me of the incorrectness
> > of thinking of the entailments of a graph as another graph.  Under
> > entailment regime E, graph G will entail graphs G0, G1, ... Gn, rather
> > than a single graph GE.  In many simple cases, the merge of G0...Gn is
> > also entailed and can be used as the single graph-of-all-entailments,
> > but in cases with disjunction, such a union is not itself entailed, so
> > there is no graph-of-all-entailments.  (I've learned and forgotten this
> > too many times, sorry.)
> >
> >
> > One more thought -- I'm not sure the word "entails" is the best word
> > here.  Perhaps "IMPLIES" would make more sense to relevant audience.
> >
> >    -- Sandro
> >
> > [1] http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference
> >
> >
> >
> >
> 
> 
>
Received on Tuesday, 20 July 2010 20:29:08 UTC