Re: Editorial thread for BGP matching from Patrick J. Hayes on 2006-01-24 (public-rdf-dawg@w3.org from January to March 2006)

From: Patrick J. Hayes <phayes@ihmc.us>
Date: Mon, 23 Jan 2006 23:22:21 -0600
To: franconi@inf.unibz.it
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <web-5621285@ihmc.us>
On 23 Jan 2006, at 22:49, Pat Hayes wrote:
I don't see the scoping set B having a different role 
than, say, the E-entailment.
Both allow us to introduce in the spec a terminology that 
future user have to refer to in order to make their 
choices - if they want to say that they are (backward) 
compatible with SPARQL. Future implementors have to 
declare, for example, what is their kind of entailment 
*and* their scoping set B, since they are in the spec.

Fair enough, but then what we should to is (a) define 
SPARQL, as directly as possible (b) define "SPARQL 
extension", using E- and B, and require the future specs 
to specify E- and B suitably, and perhaps (c) show briefly 
how SPARQL itself fits that definition by suitable choice 
of E and B. But if, as I had understood - perhaps wrongly 
- from Andy's emails, such extensions weren't even going 
to be mentioned in this document, then I see no reason to 
include (b) in it.

And in fact, in the current version of the document, 
edited by Andy himself, there is no mention whatsoever to 
any SPARQL extension. In the spirit of what I was saying 
in the above paragraph - that you fail to  understand - 
the SPARQL spec is specifying *only* the simple 
entailment, but using the framework that can be used 
somewhere else to define *smoothly* the SPARQL extensions.

I do *understand* this, but I don't *agree* that the 
current format is the best way to go. I think the result 
is neither appropriate nor effective for the SPARQL 
standard document, that it will produce enormous confusion 
and give rise to extended questions about why all this 
stuff is in the spec when it is not actually used 
anywhere. One man's smoothly is another man's total 
mystification. This is purely a matter of exposition, not 
of any disagreement about formal content. It should be 
possible for someone who wants to read and understand the 
actual SPARQL specification to read just that, and not be 
burdened with trying to make sense of elaborate 
definitions which have no actual content unless given 
flesh by examples, and which do not impinge on the actual 
spec. Putting the fully general definition in the heart of 
the SPARQL document is like requiring the reader to go on 
a long detour and work hard for no purpose. This is purely 
a matter of style, but I think style can be important. I 
agree that the more elaborate extended definitions should 
be published by this WG somewhere, and should be citable 
by those, like yourself, who wish to deal with SPARQL 
extensions, and maybe even given normative weight if we 
can do that reasonably. But the bulk of our intended 
audience will be most interested in SPARQL itself. And to 
repeat, we cannot give two different normative definitions 
of SPARQL basic query graph matching. If we state the 
general definition then we must immediately qualify it by 
imposing the particular case, and declare that to be 
normative, rendering the more general definition otiose 
and pointless. If I were reading the resulting text, I 
would wonder why the hell the definition was phrased in 
such a grossly misleading way, to no apparent purpose. The 
point becomes clear only when the other cases are 
discussed, or at least mentioned; but these other cases, 
to repeat, cannot actually be SPARQL. We can only define 
one language in the spec.

I'm not meaning to suggest that it shouldn't be written up 
and published, I was only following what I thought was 
Andy's suggestion for how to assign material to various 
documents. But this should probably be decided by the WG. 
I'd be happy either way, as long as each document (if 
there are more than one) makes internal sense.

I thought that Andy is in fact doing this:

On 20 Jan 2006, at 12:58, Seaborne, Andy wrote:
After considering what material to put in rq23 and how to 
record the discussions and conclusions in all the email, I 
propose that the interested parties prepare a separate WG 
note (or, alternatively, a member submission).
I will include some text that explains SPARQL at the level 
of simple entailment because the point of rq23 is to say 
what SPARQL/Q is, not what it might be.  It is confusing 
to have all the possible options outline when they don't 
apply to this version of SPARQL.

and in fact this is what happens in the current version of 
the document. I agree with Andy's point of view of not 
having in the spec any mention of the extensions, and this 
is what can be seen in the current document.

On 20 Jan 2006, at 12:58, Seaborne, Andy wrote:
The only other thing I think we might consider is an 
appendix that expands on the definitions section but again 
I think it should stick to what this SPARQL is.

And we also agree here.

Look, Pat: your "simplified wording" is indeed very 
similar to what we have already in the current document 
(try to read the current version with the option **). And, 
according to your complaint about the current document, 
your "simplified wording" also contains a lot of useless 
material for just defining what happens in the case of 
subgraph matching.

What useless material? The only formal addition is the 
requirement that the scoping graph contain no bnode in 
BGP, which can be expressed using six extra words in the 
definition. In exchange, one avoids the need to define 
"ordered merging", and relieves the reader of the burden 
of trying to understand why it is necessary. I drafted 
some extra explanatory prose about bnode scopes, which is 
up to Andy to use or not, his call: but I havn't yet seen 
a convincing explanation of the rationale for the other 
definition written by anyone, including yourself. Maybe I 
am unusually dense, but if you read through the email 
thread I think you will find that it took you and Sergio 
four cleverly constructed examples to make all the 
necessary points in a full explanation that I could 
follow.

So, from this point of view your wording is no better than 
the current one. It may be true that your wording could be 
easier to understand, and that's why I proposed to have it 
as *explanation* in the current document; and in fact it 
*is* there, and of course it is waiting for you to make 
the explanation better.
And most importantly, I am saying that your "simplified 
wording" is less accurate because it will not scale up 
smoothly to any extension.

Wrong. It is not in any way inaccurate, and it scales up 
smoothly to all the extensions we have contemplated in 
this thread. But perhaps we are talking past each other. 
The issue I am talking about is the distinction between 
the definition using "entails S(G' orderedmerge BGP)" as 
its core, and that using "entails (G' union S(BGP))". Call 
this difference in wording the first issue. The other 
issue is whether or not the SPARQL document should give 
the fully general form of the definition which mentions 
E-entailment and refers to a scoping set B, or the 
simplified version of it which mentions only simple 
entailment and does not bother with the scoping set 
because that is entirely irrelevant for any entailment 
below your restricted version of OWL-DL. Call that the 
second issue.

It is important to see that these issues are orthogonal. 
The first issue is just about how to best describe the 
bnode-scoping restrictions on answers with respect to the 
dataset and the query. This applies uniformly to all 
notions of RDF entailment; and the 'union' style of 
wording, with the appropriate restriction on the bnodes in 
the scoping graph, works with all entailment forms 
uniformly, and can be used in the general (E- plus B) case 
quite correctly. (It does not conflate query bnodes with 
variables, so handles the 'true existential' case for OWL 
in the way you prefer, even though this is irrelevant for 
SPARQL itself.) When I have been referring to the 
"simplified wording" I was referring to that first issue. 
Since (barring told bnodes, which we have rejected) the 
two wording forms define exactly the same actual 
mathematics, equally precisely and unambiguously, the 
choice between them should be based purely on expositional 
clarity. Given that we now have the freedom to define the 
scoping graph bnode vocabulary to be disjoint from BGP 
without loss of generality, there is simply no motivation 
to use the more elaborate and obscure ordered-merge 
construction: it has become obsolete and unnecessary.

Now, the more recent debate concerns the second issue. 
Here I agree that we should publish the fully general (E- 
plus B) definition *somewhere*, and give examples and so 
on. The only issue is, where. If the main SPARQL document 
is not going to use this elaborate machinery, I suggest 
that it should not be in that document, is all; 
particularly not as an intrusion in the first, basic, 
definitions in section 2. The appropriate strategy I 
suggest is to mention the more general framework in 
section 2, and refer to its full description elsewhere, 
perhaps later in the document, where it is stated exactly 
and where more space can be devoted to explaining it 
without intruding into the SPARQL definitions themselves.

To sum up, your proposed "simplified wording" will leave 
the issue rdfSemantics and owlDisjunction wide open

No, it would leave it in exactly the same state as if we 
used the other wording, since they express the same actual 
formal content. I do not suggest leaving either of these 
issues open.

Pat

, which is what FUB (and I guess Dan :-) would like to 
avoid.

cheers
--e.
Received on Tuesday, 24 January 2006 05:21:54 UTC