Re: bnode issue in new spec - impact on test cases? from Bijan Parsia on 2006-10-09 (public-rdf-dawg@w3.org from October to December 2006)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Mon, 9 Oct 2006 10:11:19 +0100
To: public-rdf-dawg-comments@w3.org
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>, "Ralph R. Swick" <swick@w3.org>
Message-Id: <39026CC4-9ABC-44C9-9BAB-A3879E448E4E@isr.umd.edu>
Well, I wanted very much to have that text struck from the working  
draft because it's a horrible mess that, IMHO, totally fails to  
identify the issue(s) involved, much less articulate them. I asked to  
have it removed. I asked for it to be removed, but in the interest of  
moving forward, did not object. It is a product of an Editor (Eric)  
in consultation with W3C team, or so I understand. So, they'll need  
to say what they meant in the end.

However, never let it be said that I will not wander into  
interpretative minefields. So, let us analyze!

The first line contains descriptive text with a link. the descriptive  
text is:

	""Open Issue: Should blank nodes be treated differently than  
variables in the query pattern?."""

The linked to text is:

	"""entailmentFramework
	I'm collecting all the issues related to entailment and SPARQL here  
till either a

	theme emerges or they get broken into separate issues.

	Neatly summarized by FredZ here
	FredZ's request re: entailment framework and bnode scope
		others that are related here but lumped under separate heads  
above..."""

So the first thing to recognize is that this is a catch all issue  
rather than a specific focused issue, as is indicated by the last  
line. So, it is confusing to have it "distilled" into a single  
question, especially when that question is not broad but exceedingly  
narrow.

============
Bit of speculative background: I *believe* that the real issue is  
that Eric would like us to revert to the LC1 formulation of the  
semantics of basic graph pattern matching (in terms of subgraph  
matching) instead of the LC2 formulation in terms of entailment. Of  
course, Eric should confirm (or deny) this speculation, but that's my  
impression from the last telecon. Also, this line "Alternative  
Proposal: the previous last call working draft treated blank nodes in  
the query pattern as variables." certainly is suggestive.

Since I understand this pink box to be designed to elicit community  
support (or, at least, feedback), I think, if the intent was to raise  
the issue of whether to go back to the prior framework, that saying  
that directly would be clearer. Of course, reversion is *not* a  
raised issue or proposal currently entertained by the working group.  
So this is all a bit confusing.

It's especially confusing in that it conflates, in general, issues  
concerning specifying behavior with issues concerning the specifying  
mechanism. The issue *should* be, primarily, about the the desired  
behavior, not about the specifying mechanism unless the specifying  
mechanism cannot correctly specify the desired behavior (or can do so  
only in a horridly confusing manner, etc.). I don't think the  
"cannot" is at issue, though I'm sure some people (including Eric)  
would claim that confusion is the point.

Of course, the *reason* for moving to the current framework was to  
secure the goal of a specification mechanism allowing for a certain  
sort of flexibility, indeed, a flexibility (falsely) claimed for the  
"subgraph" specification. To wit, that SPARQL be parameterizable with  
regard to the semantics of the queried graph, preferable up through  
OWL, and especially covering those tools (such as Pellet and KAON2)  
which already support SPARQL syntax. Note that the subgraph approach  
was intended to be *general* and cover RDF, and RDFS (with various  
datatypes) as well as OWL. Certainly in the form of LC1 it cannot be  
claimed to do that.

The presentation in LC2 is both complex and terse and needs work. The  
*intent* was that, for the case of simple entailment, the two  
specificaitons would specify the exact same behavior. So, if there is  
a difference, one solution is to *fix* the current specification  
using the LC2 *mechanism*. Thus, the coupling in the pink box is  
extremely tendentious. There are working group members who prefer the  
LC1 mechanism, and those who prefer the LC2 mechanism. One thing that  
cannot be claimed for the LC1 mechanism is that it specifies what to  
do even with graphs under RDFS semantics. For subgraph matching to be  
univocal between implementations, the designated to be (implicitly  
perhaps) constructed "closure" must be unique. Obviously, if we can  
construct alternative graphs we can get different answers to the same  
query. Inherently we have an issue since there can be semantically  
equivalent graphs of varying degrees of redundancy.

In general, this is very difficult to make work for OWL. You'd have  
to talk about the deductive closure of the KB, plus a reverse  
transformation to triples, plus constraints on what you can match.  
It's tricky at best. It's extra tricky to capture the existing  
behavior of SPARQL/DL query engines (such as Pellet and KAON2). Such  
behavior is not difficult to capture with the current framework  
(perhaps with some tweaking).
==============

	"""Current specification: yes: blank nodes in the query pattern are  
scoped
	to a basic graph pattern. Their use in FILTERs is unclear."""

There is a real issue about how Filters work in general, and we're  
tackling it. BNodes are tricky in general because sometimes BNode  
label corefer and somtimes they don't (consider BNodes labels in two  
distinct RDF/XML document...and consider doing a union of the two  
documents rather than a merge!). I don't know any other query  
language that uses BNodes in query patterns (TRIPLE? does N3QL count?  
Ok, at least N3QL postdates the initial development of SPARQL). I've  
heard non-OWL people oppose BNodes in patterns, so it's very  
important to separate them out.

BNodes in patterns are handy because they give us a syntactically  
distinct form of variable which allows us to have (semantically) more  
kinds of variable available. This hold across the specturm of queried  
languages. (see:
	<http://www.cs.man.ac.uk/~bparsia/2006/row-tutorial/#slide20>
for the various sorts of variables possible. Most can be simulated  
using isIRI, isBlank functions)

There are examples there.

	"""Costs: Tableau-based reasoners (at least, the Pellet Demo example  
7) rely on the current,
	more expressive semantics to match implications that are not in a  
materializable RDF graph.""""

This is just false. Suppose there were no BNodes in query patterns.  
Then Pellet would do what it did before BNodes were in query  
patterns: query variables not in the select clause would be non- 
distinguished. Current behavior:
	Query variables in head (i.e., appearing in results): Distinguished  
(i.e., bound to names and appearing in results)
	Query variables in body-only: Projected away distinguished (i.e.,  
bound to names but not appearing in results)
	BNodes in body: non-distinguished (i.e., bound to anything, not  
appearing in results)

Unlike in the RDF case, no one implements (or has a clear sense how  
to specify, much less implement) semi-distinguished variables. BNodes  
are non-distinguished in all cases (rdf, and owl).

If we dumped BNodes in queries, we'd change Pellet's behavior to:
	Query variables in head (i.e., appearing in results): Distinguished
	Query variables in body-only: Non-distinguished

Which is traditional (and its old behavior,and RDQL behavior).

Even putting aside (non)distinguished variables, subgraph matching of  
a materialized deductive closure (even an 'imaginary one') is not a  
good specification method. For one, no one has EVER done it for OWL  
(nor has anyone show it to be *correct* and complete for RDFS; it's  
all hand waving). There are good reasons to think that it won't work  
very well for OWL. That is, though I see a way to sort of do it, it  
will be brutally awful, complex, brittle, and unintelligible.

And finally:
	"""Benefit: would simplify the current semantics, which are  
difficult to specify and
	allow the expression of counter-intuitive queries."""

Clearly this means simplify the *presentation* of the current  
semantics, which isn't clear to me *at all*, especially when you  
connect the SPARQL spec to other specifications like RDF. But the  
problem with filters is not connected to the current semantic  
framework, at least inherently. And there's plenty of infelicity with  
the current algebra part of the spec (which we've been beating on),  
which has nothing to do with the core framework *per se*.

see:
	http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jun/ 
0008.html

============

On the working group side, I request that this pink box be removed  
from the current editors' draft. If the working group would like to  
publish a more detailed and accessible bit of text about the panoply  
of issues surrounding the core semantic framework, I shall be happy  
to provide one as suitable as I can make it for a general audience. I  
will, of course, present my take on these issues as we try to make  
decisions on them.

SPARQL is not simple, even in the simple case. This is to my own  
surprise.

Cheers,
Bijan.
Received on Monday, 9 October 2006 09:11:39 UTC