Re: Signalling entailment in queries from Alexandre Passant on 2010-07-21 (public-rdf-dawg@w3.org from July to September 2010)

From: Alexandre Passant <alexandre.passant@deri.org>
Date: Wed, 21 Jul 2010 11:20:44 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: Birte Glimm <birte.glimm@comlab.ox.ac.uk>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <1BEC37AD-1422-4278-8613-8940060777D6@deri.org>
On 20 Jul 2010, at 21:28, Sandro Hawke wrote:

> On Tue, 2010-07-20 at 20:12 +0100, Birte Glimm wrote:
>> On 20 July 2010 19:22, Sandro Hawke <sandro@w3.org> wrote:
>>> In discussing the graph naming issue and the related service description
>>> issue today, I agreed to take my concern/proposal to e-mail
>>> [ACTION-282].
>>> 
>>> The basic problem I have is this: I think people will be very confused
>>> if SPARQL end points start to quietly do inference.  This confusion will
>>> result in software doing the wrong thing, with possibly serious results.
>>> Some of the blame will, rightly, land on "SPARQL inference".
>>> Alternatively, for fear of this, public end points will just not do
>>> inference.
>> 
>> I can see that just writing it in the SD might not be sufficient to
>> avoid confusion. How do endpoints communicate this currently? There
>> are already systems that do RDFS. Is there confusion or are RDFS
>> inference systems just not made available via public endpoints? Since
>> SD is also a new feature, I would guess that one might also say in the
>> interface of a system that it will use RDFS inferences, so that users
>> are made aware of this.
> 
> I don't really know what folks are doing now, but I don't think there's
> any interop or major public deployment yet.
> 
> I notice that 4sreasoner (which I was coincidentally using for unrelated
> reasons this afternoon) puts its inferences all into a GRAPH named
> "http://4sreasoner.ecs.soton.ac.uk/entailedgraph/" 

IIRC, that was the same with 3store and lead to the need for some tricks to retrieve inferred data (e.g. using the entailment graph in a GRAPH pattern to retrieve subclasses w/ RDFS-inference, but using another NG / default graph to avoid them).

I think that might be a major issue for compatibility of entailment between store.
As a end-user, I shouldn't have to care in which graphs are the new statements.
That's imo where an additional keyword is useful, rather than changing the GRAPH pattern used in the query.

E.g.

RULESET RDFS
SELECT * WHERE { ?s rdfs:subClassOf ?o }

and

SELECT * WHERE { ?s rdfs:subClassOf ?o }

Rather than

SELECT * WHERE { GRAPH <http://example.org/entailmentgraph> { ?s rdfs:subClassOf ?o } }

and

SELECT * WHERE { ?s rdfs:subClassOf ?o }

Alex.

> 
> I've seen other systems that put the justification (proof) there.  I
> don't remember how it worked, but, for example, you could construct a
> URI which encoded the derivation proof steps, and then have another
> (unmaterialized) graph which provided detailed information about those
> graphs by (on demand) decoding those URIs.   I think that kind of thing
> is pretty much mandatory for a large class of use cases (where you're
> accepting lots of sources, but tracking provenance). 
> 
>>> What we've got so far is the idea that folks should look at the service
>>> description.   I think this is probably too "quiet".  As Souri pointed
>>> out today, what happens if you looked at the service description at 5pm
>>> and inference gets turned on at 6pm?  Yes, we can probably find
>>> workarounds -- like saying you should use a different end point address
>>> any time you make this kind of change to the SD -- but I'm thinking it's
>>> better to just add something to the query language.    Listening to
>>> folks in the meeting talk about it, I thought of this:
>>> 
>>>       SELECT * FROM NAMED <g1>
>>>       WHERE { GRAPH <g1> ENTAILS { ?x rdfs:subClassOf ?y } }
>> 
>> In general, I like the idea of having a query language keyword. This
>> suggestion seems, however, suitable for the case of named graphs and
>> it is not clear which kind of inference will be performed (RDF, RDFS,
>> D-Entailment, ...). I could also imagine something like
>> 
>> SELECT * FROM NAMED <g1>
>> USING <http://www.w3.org/ns/entailment/RDFS>
>> WHERE { GRAPH <g1> { ?x rdfs:subClassOf ?y } }
>> 
>> or shorter with prefix with a query for the default graph:
>> 
>> PREFIX ent: <http://www.w3.org/ns/entailment/>
>> SELECT * USING ent:RDFS WHERE { ?x rdfs:subClassOf ?y }
> 
> Those are nice.   I wonder if we can come up with a common syntax for
> both the name-graph and default-graph case.    I suppose this could
> work:
> 
>        SELECT * FROM NAMED <g1>
>        WHERE { GRAPH <g1> USING ent:RDFS { ?x rdfs:subClassOf ?y } }
> 
> We lose the ability to have unparameterized entailment, where you have
> to look at the SD, but maybe that's okay.  
> 
> Maybe instead of 
>        USING ent-regime-id
> it could be
>        USING [INFERENCE] [ent-regime-id]
> where you have to provide at least one of those optional bits
> 
>> However, this introduces a new keyword into the language and it is
>> exactly what was decided to be out of scope (same for your suggestion
>> of course).
> 
> I wasn't part of the WG at scoping time, but Ivan tells me it was only
> the parameterization that was judged out of scope at the time, not the
> adding of a keyword.   That's another reason to have an unparameterized
> version.  :-)
> 
>>> Here, the system making the query is explicitly asking for inference, so
>>> no harm can be done by the end-point suddenly turning on inference, as
>>> long as it still allows querying of the pre-inference graph.  If the
>>> end-point doesn't want to keep the pre-inference graph around, then
>>> users will have to modify their queries to include the ENTAILS
>>> keyword, ... but that seems right, since those users will need to think
>>> about whether the change in results is okay for their application.
>> 
>> I just want to point out here that there does not have to be an
>> inference graph. This is just one way of implementing it. You could
>> rewrite your queries and evaluate more complex queries over the
>> original graph, e.g., you could rewrite queries using property paths
>> to implement the RDFS ent. reg. and you would not need a (partial)
>> closure, but just support for path expressions. I am not sure whether
>> all RDFS can be done with the currently proposed property paths, but I
>> can modify query evaluation instead of materialising triples. I think
>> this is important and although building the partial closure according
>> to some rules, this is and should not be the only possible
>> implementation technique.
> 
> I spent years programming in prolog, so the idea that a graph has to be
> materialized to be referred to and used doesn't even occur to me.
> Whether queries against an inferred graph are performed via query
> rewriting or backward chaining or forward chaining (materialization)
> or ... something else (does a hyper tableaux reasoner count as something
> else?), doesn't matter.      But, as mentioned, for some ERs there isn't
> a single "inference graph", so the concept is perhaps best avoided.
> 
>>> In this design, the kind of entailment would be specified in the SD.
>>> 
>>> It's tempting to also allow parameterization of the entailment regime,
>>> perhaps like this:
>>> 
>>>   PREFIX ent <http://www.w3.org/ns/entailment/>
>>>       SELECT * FROM NAMED <g1>
>>>       WHERE { GRAPH <g1> ENTAILS BY ent:RDFS { ?x
>>>       rdfs:subClassOf ?y } }
>> 
>> I think this is an abuse of prefix that we shouldn't allow. Prefix is
>> a directive for the parser to enable the expansion of abbreviaed IRIs
>> into fully qualified ones. I would not want to overload this with a
>> specification for entailments, which has nothing to do with parsing or
>> IRI expansion. I do like a keyword, but it would have to be one that
>> is not yet used in SPARQL I think.
> 
> I think you're misunderstanding me here.   I'm using PREFIX exactly as
> you say -- as a shorthand allowing abbreviation of IRIs.  I just happen
> to be using it for abbreviating the IRI of an entailment regime, just
> like you did in an example above. If you don't use PREFIX, it looks like
> this: 
> 
>        SELECT * FROM NAMED <g1>
>        WHERE { GRAPH <g1> ENTAILS BY
>        <http://www.w3.org/ns/entailment/RDFS> { ?x
>        rdfs:subClassOf ?y } }
> 
>  -- Sandro
> 
>>> I understand this parameterization is similar to the out-of-scope
>>> "Parameterized Inference" feature [1], but perhaps if it's really as
>>> simple as this, it's okay to do anyway.   (If not, is there a way to
>>> suggest that everyone implement it the same way, even if it's not in the
>>> spec?  :-)
>> 
>> I would be happy for either (having a keyword despite it being out of
>> scope or an informal agreement). With an informal agreement it is,
>> however, not forbidden to do inferences no matter whether the chosen
>> keyword has been used or not.
>> 
>> 
>> Birte
>> 
>> 
>>> Finally, I wanted to thank folks for reminding me of the incorrectness
>>> of thinking of the entailments of a graph as another graph.  Under
>>> entailment regime E, graph G will entail graphs G0, G1, ... Gn, rather
>>> than a single graph GE.  In many simple cases, the merge of G0...Gn is
>>> also entailed and can be used as the single graph-of-all-entailments,
>>> but in cases with disjunction, such a union is not itself entailed, so
>>> there is no graph-of-all-entailments.  (I've learned and forgotten this
>>> too many times, sorry.)
>>> 
>>> 
>>> One more thought -- I'm not sure the word "entails" is the best word
>>> here.  Perhaps "IMPLIES" would make more sense to relevant audience.
>>> 
>>>   -- Sandro
>>> 
>>> [1] http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 

--
Dr. Alexandre Passant
Digital Enterprise Research Institute
National University of Ireland, Galway
:me owl:sameAs <http://apassant.net/alex> .
Received on Wednesday, 21 July 2010 10:21:16 UTC