Re: Signalling entailment in queries from Chimezie Ogbuji on 2010-07-21 (public-rdf-dawg@w3.org from July to September 2010)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Tue, 20 Jul 2010 21:16:23 -0400
To: "Sandro Hawke" <sandro@w3.org>, "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-ID: <C86BC2A7.12822%ogbujic@ccf.org>
Sandro, a few comments below (it seems like there are at least 3 threads in
the issue we discussed in today's telecon - I'll try to tease them out).

On 7/20/10 2:22 PM, "Sandro Hawke" <sandro@w3.org> wrote:
> The basic problem I have is this: I think people will be very confused
> if SPARQL end points start to quietly do inference.

I can sympathize with this and (if I understand

> This confusion will
> result in software doing the wrong thing, with possibly serious results.
> Some of the blame will, rightly, land on "SPARQL inference".
> Alternatively, for fear of this, public end points will just not do
> inference.

I can completely sympathize with this concern and think Souri as well as
others echo this concern as well.  Hopefully, I think we are better informed
about why this is important than we were when we chose not to support the
ability for the user to specify the entailment regime in the query and if we
are willing to revisit the conversation about LET after deciding it was out
of scope (for instance), then we should consider the same with this issue if
there is critical mass.

At a minimum, a query should be able to specify (by entailment regime URI)
the entailment regime to use.  I can imagine scenarios where - despite the
fact that the service indicates a particular regime for its data -  the user
might prefer a different regime for one reason or another.  Since we support
an order of precedence in determining which dataset is used for a query
(either it is specified in the query, or in the protocol) I don't see why we
can't do the same for the entailment regime to use in determining answers
beyond simple graph matching via a simple extension to the syntax, some
backward compatible pragmas, or some other light weight mechanism.

> What we've got so far is the idea that folks should look at the service
> description.   I think this is probably too "quiet".

Not just too quiet but it also (unnecessarily) handicaps how entailment is
used in SPARQL.

> ..snip .. 
> Listening to
> folks in the meeting talk about it, I thought of this:
> 
>         SELECT * FROM NAMED <g1>
>         WHERE { GRAPH <g1> ENTAILS { ?x rdfs:subClassOf ?y } }
> 
> Here, the system making the query is explicitly asking for inference, so
> no harm can be done by the end-point suddenly turning on inference, as
> long as it still allows querying of the pre-inference graph.

So this is where I think your issues may be tangled up.  If issue (1) is the
need to indicate an entailment regime to use for answers to a query, then
this example also includes an issue (2).  I'm not sure what to call it, but
it seems you are interested in explicitly relating the scoping graph (which
exists in the dataset) with another graph that it entails (rather than
simply indicating an appropriate entailment regime alone).

This seems to be a procedural interpretation of entailment and entailment is
a purely declarative thing - i.e., it says *what* should follow from what
you have and some axioms, but not *how* it is calculated (and there are a
number of ways it can be calculated).

By requiring that entailment be 'written down' in this way it seems that you
are assuming a particular reasoning strategy (forward-chaining) which is
completely independent from the entailments.

So, I agree that the query should be able to indicate an entailment regime,
but I don't agree with the mechanism you suggest above, which seems to make
assumptions about entailment that are counter-intuitive to their declarative
nature.  The query should instead be.

SELECT * FROM NAMED <g1>
USING <http://www.w3.org/ns/entailment/RDFS>
WHERE { GRAPH <g1> { ?x rdfs:subClassOf ?y } }

>..snip...
> In this design, the kind of entailment would be specified in the SD.

I think the query should be able to specify the kind of entailment (as well
as the SD) and that should be all it needs to specify.  So if you think of
querying as the evaluation of a function, the entailment regime is just
another parameter:

queryEvalation(query, dataset = ..specified in SD.., entailmentRegime =
..specified in SD..)

Where the default dataset and entailment regime are those specified in the
SD if they aren't given in the arguments to the query.
 
> It's tempting to also allow parameterization of the entailment regime,
> perhaps like this:
>  
>     PREFIX ent <http://www.w3.org/ns/entailment/>
>         SELECT * FROM NAMED <g1>
>         WHERE { GRAPH <g1> ENTAILS BY ent:RDFS { ?x
>         rdfs:subClassOf ?y } }

This is what I prefer, but with the difference I mentioned above.
 
> I understand this parameterization is similar to the out-of-scope
> "Parameterized Inference" feature [1], but perhaps if it's really as
> simple as this, it's okay to do anyway.

I think it is rather simple.  We have every thing in place to simply take
advantage of a straight forward in-query syntax to indicate the entailment
regime. 

> Finally, I wanted to thank folks for reminding me of the incorrectness
> of thinking of the entailments of a graph as another graph.  Under
> entailment regime E, graph G will entail graphs G0, G1, ... Gn, rather
> than a single graph GE.  In many simple cases, the merge of G0...Gn is
> also entailed and can be used as the single graph-of-all-entailments,

It still does seem that conceptually thinking of *having* a single graph of
all entailments (a procedural interpretation of entailment) is a motivation
underlying your suggestions.

-- Chime


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News & World Report (2009).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Wednesday, 21 July 2010 01:17:04 UTC