Re: [Fwd: Comments on SPARQL] (entailment, soundness, completeness)

Oops - pressed send button before I had finished writing - sorry. I'll 
try again...

On 9 Sep 2005, at 22:59, Dan Connolly wrote:

> On Fri, 2005-09-09 at 15:39 -0500, Dan Connolly wrote:
>> Copying this to public-rdf-dawg comments for tracking purposes.
>> Follow-up discussion should happen there...
>> email message attachment, "Forwarded message - Comments on SPARQL"
>> On Fri, 2005-09-09 at 15:39 -0500, Dan Connolly wrote:
>>> Two comments/questions:
>>> Firstly, I strongly support the suggestion to define query answers in
>>> terms of entailment rather than subgraph. Using an entailment based
>>> definition has numerous advantages, and no disadvantages that I can
>>> see.
>>> Entailment based definition:
>>> - is widely used and very well understood
>>> - is concise, clear and unambiguous
>>> - builds on existing semantic definitions
>>> - can deal with a wide variety of languages, including RDF, RDFS, 
>>> OWL,
>>> SWRL and FOL, simply by referring to the existing entailment 
>>> semantics
>>> for the relevant language
> I'm not clear on what you mean by that. Do you mean that there's one
> definition of entailment that SPARQL can use that simultaneously
> accommodates RDF, RDF, OWL, SWRL, and FOL?

No, not simultaneously. But if query answers are defined in terms of 
bindings that cause the query to be entailed in all models, then one 
can defer to the model theory for the semantic conditions that apply to 
models. (For languages with different underlying model structures, it 
may also be necessary to adapt the definition of a binding, but this 
should be *much* less work that developing an independent (of the model 
theory) definition of query answering.)

> I understand that each of those languages has a clear and unambiguous
> definition of entailment, but I don't understand how to apply all of
> them to SPARQL.

Querying a given language simply requires deferring to the relevant 
model theory (and possibly adapting the definition of a binding for 
languages that don't share the same model structure as RDF). More 
powerful languages allow more complex conditions to be imposed; more 
models can thus be ruled out, and so more answers may be inferred.

> Are you suggesting that SPARQL should have some parameterized version
> of entailment? I can imagine how that might work, though I'm not
> aware of much implementation experience with designs like that.

I only intended to say that, by using an entailment based definition of 
query answers, it would be *much* easier to extend SPARQL to deal with 
other languages, perhaps requiring little more than referring to the 
relevant MT.

> Bijan provided a quick review of various relevant implementations...
> OWL editors and their underlying toolkits (triples or other)
> I have only begun to look into those.
> If you have any particular designs to recommend, please let us know.
>>> Subgraph based definition:
>>> - is not widely used, not well understood, and not *known* to work at
>>> all (at least not for more expressive languages)
>>> - is verbose, obfuscated (it seems capable of confusing even expert
>>> logicians), and may even be ambiguous
>>> - ignores existing semantic definitions
>>> - cannot easily deal with more expressive languages, and may even be
>>> incapable of doing so
>>> Can someone please explain to me why the subgraph based definition 
>>> has
>>> been preferred?
> The WG as a whole hasn't expressed a preference directly, but in
> drafting the definitions and considering simple test cases, the
> details of subgraph seemed to work out and the details of entailment
> seemed not to. For example, here's part of one message from 09 Jun 2005
> [[[
> The difference is observable from an approved*** test
> input:
>  :x :p :v1 .
>  :x :p :v2 .
> query:
>   SELECT *
> WHERE { :x ?p ?q . }
> By the simple-entailment definition, there are solutions that bind
> ?p to _:foo, but there are no such results in the test results.

I saw in another email that this was a typo and you meant ?q to be 
bound to _:foo.

I believe that a better solution to these kinds of problem is to add 
some kind of minimality condition as suggested by Enrico (and as used 
in OQL), i.e., the answer set should not contain a tuple that is less 
specific than (entailed by) another tuple in the answer set.

Actually, the idea of returning blank nodes in query answers seems a 
little strange to me - better to use non-distinguished variables 
(variables that are not necessarily bound to named elements of a model) 
and perform some kind of "outer union" if combining results with 
different numbers of distinguished variables. I believe that this 
approach would also be more easily extensible (i.e., to  more 
expressive languages).


> I suppose it's possible that the spec could prune the results
> down to the ones in the test suite some other way, but I can't
> think of any other straightforward way just now.
> ]]]
>  -- Re: Restructure definition of Basic Graph Pattern and pattern 
> match (sec 2.4)

>>> Secondly, IMHO, the minimum requirement for a query language standard
>>> is that, independently of the features and capabilities of any given
>>> implementation, it should define what would constitute a sound and
>>> complete answer for a given language, dataset and query. It is not at
>>> all clear that, in its current form, SPARQL satisfies this 
>>> requirement.
>>> Am I missing something?
> I don't believe you are missing anything; the WG has not looked
> very deeply at such a requirement. Yours is one of several
> recent comment that asks that we do so.
> Due to the volume of comments received, it may be some weeks before
> we have a reply in substance. Please stand by.
> -- 
> Dan Connolly, W3C
> D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Sunday, 11 September 2005 21:35:09 UTC