Re: Signalling entailment in queries from Birte Glimm on 2010-07-20 (public-rdf-dawg@w3.org from July to September 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Tue, 20 Jul 2010 20:12:58 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTimzbcxXQIASwVa_bynhoPx05LtTLi9gS5ugodIz@mail.gmail.com>
On 20 July 2010 19:22, Sandro Hawke <sandro@w3.org> wrote:
> In discussing the graph naming issue and the related service description
> issue today, I agreed to take my concern/proposal to e-mail
> [ACTION-282].
>
> The basic problem I have is this: I think people will be very confused
> if SPARQL end points start to quietly do inference.  This confusion will
> result in software doing the wrong thing, with possibly serious results.
> Some of the blame will, rightly, land on "SPARQL inference".
> Alternatively, for fear of this, public end points will just not do
> inference.

I can see that just writing it in the SD might not be sufficient to
avoid confusion. How do endpoints communicate this currently? There
are already systems that do RDFS. Is there confusion or are RDFS
inference systems just not made available via public endpoints? Since
SD is also a new feature, I would guess that one might also say in the
interface of a system that it will use RDFS inferences, so that users
are made aware of this.

> What we've got so far is the idea that folks should look at the service
> description.   I think this is probably too "quiet".  As Souri pointed
> out today, what happens if you looked at the service description at 5pm
> and inference gets turned on at 6pm?  Yes, we can probably find
> workarounds -- like saying you should use a different end point address
> any time you make this kind of change to the SD -- but I'm thinking it's
> better to just add something to the query language.    Listening to
> folks in the meeting talk about it, I thought of this:
>
>        SELECT * FROM NAMED <g1>
>        WHERE { GRAPH <g1> ENTAILS { ?x rdfs:subClassOf ?y } }

In general, I like the idea of having a query language keyword. This
suggestion seems, however, suitable for the case of named graphs and
it is not clear which kind of inference will be performed (RDF, RDFS,
D-Entailment, ...). I could also imagine something like

SELECT * FROM NAMED <g1>
USING <http://www.w3.org/ns/entailment/RDFS>
WHERE { GRAPH <g1> { ?x rdfs:subClassOf ?y } }

or shorter with prefix with a query for the default graph:

PREFIX ent: <http://www.w3.org/ns/entailment/>
SELECT * USING ent:RDFS WHERE { ?x rdfs:subClassOf ?y }

However, this introduces a new keyword into the language and it is
exactly what was decided to be out of scope (same for your suggestion
of course).

> Here, the system making the query is explicitly asking for inference, so
> no harm can be done by the end-point suddenly turning on inference, as
> long as it still allows querying of the pre-inference graph.  If the
> end-point doesn't want to keep the pre-inference graph around, then
> users will have to modify their queries to include the ENTAILS
> keyword, ... but that seems right, since those users will need to think
> about whether the change in results is okay for their application.

I just want to point out here that there does not have to be an
inference graph. This is just one way of implementing it. You could
rewrite your queries and evaluate more complex queries over the
original graph, e.g., you could rewrite queries using property paths
to implement the RDFS ent. reg. and you would not need a (partial)
closure, but just support for path expressions. I am not sure whether
all RDFS can be done with the currently proposed property paths, but I
can modify query evaluation instead of materialising triples. I think
this is important and although building the partial closure according
to some rules, this is and should not be the only possible
implementation technique.

> In this design, the kind of entailment would be specified in the SD.
>
> It's tempting to also allow parameterization of the entailment regime,
> perhaps like this:
>
>    PREFIX ent <http://www.w3.org/ns/entailment/>
>        SELECT * FROM NAMED <g1>
>        WHERE { GRAPH <g1> ENTAILS BY ent:RDFS { ?x
>        rdfs:subClassOf ?y } }

I think this is an abuse of prefix that we shouldn't allow. Prefix is
a directive for the parser to enable the expansion of abbreviaed IRIs
into fully qualified ones. I would not want to overload this with a
specification for entailments, which has nothing to do with parsing or
IRI expansion. I do like a keyword, but it would have to be one that
is not yet used in SPARQL I think.

> I understand this parameterization is similar to the out-of-scope
> "Parameterized Inference" feature [1], but perhaps if it's really as
> simple as this, it's okay to do anyway.   (If not, is there a way to
> suggest that everyone implement it the same way, even if it's not in the
> spec?  :-)

I would be happy for either (having a keyword despite it being out of
scope or an informal agreement). With an informal agreement it is,
however, not forbidden to do inferences no matter whther the chosen
keyword has been used or not.


Birte


> Finally, I wanted to thank folks for reminding me of the incorrectness
> of thinking of the entailments of a graph as another graph.  Under
> entailment regime E, graph G will entail graphs G0, G1, ... Gn, rather
> than a single graph GE.  In many simple cases, the merge of G0...Gn is
> also entailed and can be used as the single graph-of-all-entailments,
> but in cases with disjunction, such a union is not itself entailed, so
> there is no graph-of-all-entailments.  (I've learned and forgotten this
> too many times, sorry.)
>
>
> One more thought -- I'm not sure the word "entails" is the best word
> here.  Perhaps "IMPLIES" would make more sense to relevant audience.
>
>    -- Sandro
>
> [1] http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference
>
>
>
>



-- 
Dr. Birte Glimm, Room 309
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283520
Received on Tuesday, 20 July 2010 19:13:28 UTC