Signalling entailment in queries from Sandro Hawke on 2010-07-20 (public-rdf-dawg@w3.org from July to September 2010)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 20 Jul 2010 14:22:13 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <1279650133.14023.295.camel@waldron>
In discussing the graph naming issue and the related service description
issue today, I agreed to take my concern/proposal to e-mail
[ACTION-282].

The basic problem I have is this: I think people will be very confused
if SPARQL end points start to quietly do inference.  This confusion will
result in software doing the wrong thing, with possibly serious results.
Some of the blame will, rightly, land on "SPARQL inference".
Alternatively, for fear of this, public end points will just not do
inference.

What we've got so far is the idea that folks should look at the service
description.   I think this is probably too "quiet".  As Souri pointed
out today, what happens if you looked at the service description at 5pm
and inference gets turned on at 6pm?  Yes, we can probably find
workarounds -- like saying you should use a different end point address
any time you make this kind of change to the SD -- but I'm thinking it's
better to just add something to the query language.    Listening to
folks in the meeting talk about it, I thought of this:

        SELECT * FROM NAMED <g1> 
        WHERE { GRAPH <g1> ENTAILS { ?x rdfs:subClassOf ?y } }

Here, the system making the query is explicitly asking for inference, so
no harm can be done by the end-point suddenly turning on inference, as
long as it still allows querying of the pre-inference graph.  If the
end-point doesn't want to keep the pre-inference graph around, then
users will have to modify their queries to include the ENTAILS
keyword, ... but that seems right, since those users will need to think
about whether the change in results is okay for their application.

In this design, the kind of entailment would be specified in the SD. 

It's tempting to also allow parameterization of the entailment regime,
perhaps like this:
 
    PREFIX ent <http://www.w3.org/ns/entailment/>
        SELECT * FROM NAMED <g1> 
        WHERE { GRAPH <g1> ENTAILS BY ent:RDFS { ?x
        rdfs:subClassOf ?y } }

I understand this parameterization is similar to the out-of-scope
"Parameterized Inference" feature [1], but perhaps if it's really as
simple as this, it's okay to do anyway.   (If not, is there a way to
suggest that everyone implement it the same way, even if it's not in the
spec?  :-)

Finally, I wanted to thank folks for reminding me of the incorrectness
of thinking of the entailments of a graph as another graph.  Under
entailment regime E, graph G will entail graphs G0, G1, ... Gn, rather
than a single graph GE.  In many simple cases, the merge of G0...Gn is
also entailed and can be used as the single graph-of-all-entailments,
but in cases with disjunction, such a union is not itself entailed, so
there is no graph-of-all-entailments.  (I've learned and forgotten this
too many times, sorry.)

One more thought -- I'm not sure the word "entails" is the best word
here.  Perhaps "IMPLIES" would make more sense to relevant audience. 

    -- Sandro

[1] http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference
Received on Tuesday, 20 July 2010 18:22:22 UTC