W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2010

Re: Signalling entailment in queries

From: Alexandre Passant <alexandre.passant@deri.org>
Date: Wed, 21 Jul 2010 11:20:44 +0100
Cc: Birte Glimm <birte.glimm@comlab.ox.ac.uk>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <1BEC37AD-1422-4278-8613-8940060777D6@deri.org>
To: Sandro Hawke <sandro@w3.org>

On 20 Jul 2010, at 21:28, Sandro Hawke wrote:

> On Tue, 2010-07-20 at 20:12 +0100, Birte Glimm wrote:
>> On 20 July 2010 19:22, Sandro Hawke <sandro@w3.org> wrote:
>>> In discussing the graph naming issue and the related service description
>>> issue today, I agreed to take my concern/proposal to e-mail
>>> [ACTION-282].
>>> The basic problem I have is this: I think people will be very confused
>>> if SPARQL end points start to quietly do inference.  This confusion will
>>> result in software doing the wrong thing, with possibly serious results.
>>> Some of the blame will, rightly, land on "SPARQL inference".
>>> Alternatively, for fear of this, public end points will just not do
>>> inference.
>> I can see that just writing it in the SD might not be sufficient to
>> avoid confusion. How do endpoints communicate this currently? There
>> are already systems that do RDFS. Is there confusion or are RDFS
>> inference systems just not made available via public endpoints? Since
>> SD is also a new feature, I would guess that one might also say in the
>> interface of a system that it will use RDFS inferences, so that users
>> are made aware of this.
> I don't really know what folks are doing now, but I don't think there's
> any interop or major public deployment yet.
> I notice that 4sreasoner (which I was coincidentally using for unrelated
> reasons this afternoon) puts its inferences all into a GRAPH named
> "http://4sreasoner.ecs.soton.ac.uk/entailedgraph/" 

IIRC, that was the same with 3store and lead to the need for some tricks to retrieve inferred data (e.g. using the entailment graph in a GRAPH pattern to retrieve subclasses w/ RDFS-inference, but using another NG / default graph to avoid them).

I think that might be a major issue for compatibility of entailment between store.
As a end-user, I shouldn't have to care in which graphs are the new statements.
That's imo where an additional keyword is useful, rather than changing the GRAPH pattern used in the query.


SELECT * WHERE { ?s rdfs:subClassOf ?o }


SELECT * WHERE { ?s rdfs:subClassOf ?o }

Rather than

SELECT * WHERE { GRAPH <http://example.org/entailmentgraph> { ?s rdfs:subClassOf ?o } }


SELECT * WHERE { ?s rdfs:subClassOf ?o }


> I've seen other systems that put the justification (proof) there.  I
> don't remember how it worked, but, for example, you could construct a
> URI which encoded the derivation proof steps, and then have another
> (unmaterialized) graph which provided detailed information about those
> graphs by (on demand) decoding those URIs.   I think that kind of thing
> is pretty much mandatory for a large class of use cases (where you're
> accepting lots of sources, but tracking provenance). 
>>> What we've got so far is the idea that folks should look at the service
>>> description.   I think this is probably too "quiet".  As Souri pointed
>>> out today, what happens if you looked at the service description at 5pm
>>> and inference gets turned on at 6pm?  Yes, we can probably find
>>> workarounds -- like saying you should use a different end point address
>>> any time you make this kind of change to the SD -- but I'm thinking it's
>>> better to just add something to the query language.    Listening to
>>> folks in the meeting talk about it, I thought of this:
>>>       SELECT * FROM NAMED <g1>
>>>       WHERE { GRAPH <g1> ENTAILS { ?x rdfs:subClassOf ?y } }
>> In general, I like the idea of having a query language keyword. This
>> suggestion seems, however, suitable for the case of named graphs and
>> it is not clear which kind of inference will be performed (RDF, RDFS,
>> D-Entailment, ...). I could also imagine something like
>> USING <http://www.w3.org/ns/entailment/RDFS>
>> WHERE { GRAPH <g1> { ?x rdfs:subClassOf ?y } }
>> or shorter with prefix with a query for the default graph:
>> PREFIX ent: <http://www.w3.org/ns/entailment/>
>> SELECT * USING ent:RDFS WHERE { ?x rdfs:subClassOf ?y }
> Those are nice.   I wonder if we can come up with a common syntax for
> both the name-graph and default-graph case.    I suppose this could
> work:
>        SELECT * FROM NAMED <g1>
>        WHERE { GRAPH <g1> USING ent:RDFS { ?x rdfs:subClassOf ?y } }
> We lose the ability to have unparameterized entailment, where you have
> to look at the SD, but maybe that's okay.  
> Maybe instead of 
>        USING ent-regime-id
> it could be
>        USING [INFERENCE] [ent-regime-id]
> where you have to provide at least one of those optional bits
>> However, this introduces a new keyword into the language and it is
>> exactly what was decided to be out of scope (same for your suggestion
>> of course).
> I wasn't part of the WG at scoping time, but Ivan tells me it was only
> the parameterization that was judged out of scope at the time, not the
> adding of a keyword.   That's another reason to have an unparameterized
> version.  :-)
>>> Here, the system making the query is explicitly asking for inference, so
>>> no harm can be done by the end-point suddenly turning on inference, as
>>> long as it still allows querying of the pre-inference graph.  If the
>>> end-point doesn't want to keep the pre-inference graph around, then
>>> users will have to modify their queries to include the ENTAILS
>>> keyword, ... but that seems right, since those users will need to think
>>> about whether the change in results is okay for their application.
>> I just want to point out here that there does not have to be an
>> inference graph. This is just one way of implementing it. You could
>> rewrite your queries and evaluate more complex queries over the
>> original graph, e.g., you could rewrite queries using property paths
>> to implement the RDFS ent. reg. and you would not need a (partial)
>> closure, but just support for path expressions. I am not sure whether
>> all RDFS can be done with the currently proposed property paths, but I
>> can modify query evaluation instead of materialising triples. I think
>> this is important and although building the partial closure according
>> to some rules, this is and should not be the only possible
>> implementation technique.
> I spent years programming in prolog, so the idea that a graph has to be
> materialized to be referred to and used doesn't even occur to me.
> Whether queries against an inferred graph are performed via query
> rewriting or backward chaining or forward chaining (materialization)
> or ... something else (does a hyper tableaux reasoner count as something
> else?), doesn't matter.      But, as mentioned, for some ERs there isn't
> a single "inference graph", so the concept is perhaps best avoided.
>>> In this design, the kind of entailment would be specified in the SD.
>>> It's tempting to also allow parameterization of the entailment regime,
>>> perhaps like this:
>>>   PREFIX ent <http://www.w3.org/ns/entailment/>
>>>       SELECT * FROM NAMED <g1>
>>>       WHERE { GRAPH <g1> ENTAILS BY ent:RDFS { ?x
>>>       rdfs:subClassOf ?y } }
>> I think this is an abuse of prefix that we shouldn't allow. Prefix is
>> a directive for the parser to enable the expansion of abbreviaed IRIs
>> into fully qualified ones. I would not want to overload this with a
>> specification for entailments, which has nothing to do with parsing or
>> IRI expansion. I do like a keyword, but it would have to be one that
>> is not yet used in SPARQL I think.
> I think you're misunderstanding me here.   I'm using PREFIX exactly as
> you say -- as a shorthand allowing abbreviation of IRIs.  I just happen
> to be using it for abbreviating the IRI of an entailment regime, just
> like you did in an example above. If you don't use PREFIX, it looks like
> this: 
>        SELECT * FROM NAMED <g1>
>        WHERE { GRAPH <g1> ENTAILS BY
>        <http://www.w3.org/ns/entailment/RDFS> { ?x
>        rdfs:subClassOf ?y } }
>  -- Sandro
>>> I understand this parameterization is similar to the out-of-scope
>>> "Parameterized Inference" feature [1], but perhaps if it's really as
>>> simple as this, it's okay to do anyway.   (If not, is there a way to
>>> suggest that everyone implement it the same way, even if it's not in the
>>> spec?  :-)
>> I would be happy for either (having a keyword despite it being out of
>> scope or an informal agreement). With an informal agreement it is,
>> however, not forbidden to do inferences no matter whether the chosen
>> keyword has been used or not.
>> Birte
>>> Finally, I wanted to thank folks for reminding me of the incorrectness
>>> of thinking of the entailments of a graph as another graph.  Under
>>> entailment regime E, graph G will entail graphs G0, G1, ... Gn, rather
>>> than a single graph GE.  In many simple cases, the merge of G0...Gn is
>>> also entailed and can be used as the single graph-of-all-entailments,
>>> but in cases with disjunction, such a union is not itself entailed, so
>>> there is no graph-of-all-entailments.  (I've learned and forgotten this
>>> too many times, sorry.)
>>> One more thought -- I'm not sure the word "entails" is the best word
>>> here.  Perhaps "IMPLIES" would make more sense to relevant audience.
>>>   -- Sandro
>>> [1] http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference

Dr. Alexandre Passant
Digital Enterprise Research Institute
National University of Ireland, Galway
:me owl:sameAs <http://apassant.net/alex> .
Received on Wednesday, 21 July 2010 10:21:16 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:43 GMT