- From: Bijan Parsia <bparsia@isr.umd.edu>
- Date: Mon, 9 Oct 2006 10:11:19 +0100
- To: public-rdf-dawg-comments@w3.org
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>, "Ralph R. Swick" <swick@w3.org>
Well, I wanted very much to have that text struck from the working draft because it's a horrible mess that, IMHO, totally fails to identify the issue(s) involved, much less articulate them. I asked to have it removed. I asked for it to be removed, but in the interest of moving forward, did not object. It is a product of an Editor (Eric) in consultation with W3C team, or so I understand. So, they'll need to say what they meant in the end. However, never let it be said that I will not wander into interpretative minefields. So, let us analyze! The first line contains descriptive text with a link. the descriptive text is: ""Open Issue: Should blank nodes be treated differently than variables in the query pattern?.""" The linked to text is: """entailmentFramework I'm collecting all the issues related to entailment and SPARQL here till either a theme emerges or they get broken into separate issues. Neatly summarized by FredZ here FredZ's request re: entailment framework and bnode scope others that are related here but lumped under separate heads above...""" So the first thing to recognize is that this is a catch all issue rather than a specific focused issue, as is indicated by the last line. So, it is confusing to have it "distilled" into a single question, especially when that question is not broad but exceedingly narrow. ============ Bit of speculative background: I *believe* that the real issue is that Eric would like us to revert to the LC1 formulation of the semantics of basic graph pattern matching (in terms of subgraph matching) instead of the LC2 formulation in terms of entailment. Of course, Eric should confirm (or deny) this speculation, but that's my impression from the last telecon. Also, this line "Alternative Proposal: the previous last call working draft treated blank nodes in the query pattern as variables." certainly is suggestive. Since I understand this pink box to be designed to elicit community support (or, at least, feedback), I think, if the intent was to raise the issue of whether to go back to the prior framework, that saying that directly would be clearer. Of course, reversion is *not* a raised issue or proposal currently entertained by the working group. So this is all a bit confusing. It's especially confusing in that it conflates, in general, issues concerning specifying behavior with issues concerning the specifying mechanism. The issue *should* be, primarily, about the the desired behavior, not about the specifying mechanism unless the specifying mechanism cannot correctly specify the desired behavior (or can do so only in a horridly confusing manner, etc.). I don't think the "cannot" is at issue, though I'm sure some people (including Eric) would claim that confusion is the point. Of course, the *reason* for moving to the current framework was to secure the goal of a specification mechanism allowing for a certain sort of flexibility, indeed, a flexibility (falsely) claimed for the "subgraph" specification. To wit, that SPARQL be parameterizable with regard to the semantics of the queried graph, preferable up through OWL, and especially covering those tools (such as Pellet and KAON2) which already support SPARQL syntax. Note that the subgraph approach was intended to be *general* and cover RDF, and RDFS (with various datatypes) as well as OWL. Certainly in the form of LC1 it cannot be claimed to do that. The presentation in LC2 is both complex and terse and needs work. The *intent* was that, for the case of simple entailment, the two specificaitons would specify the exact same behavior. So, if there is a difference, one solution is to *fix* the current specification using the LC2 *mechanism*. Thus, the coupling in the pink box is extremely tendentious. There are working group members who prefer the LC1 mechanism, and those who prefer the LC2 mechanism. One thing that cannot be claimed for the LC1 mechanism is that it specifies what to do even with graphs under RDFS semantics. For subgraph matching to be univocal between implementations, the designated to be (implicitly perhaps) constructed "closure" must be unique. Obviously, if we can construct alternative graphs we can get different answers to the same query. Inherently we have an issue since there can be semantically equivalent graphs of varying degrees of redundancy. In general, this is very difficult to make work for OWL. You'd have to talk about the deductive closure of the KB, plus a reverse transformation to triples, plus constraints on what you can match. It's tricky at best. It's extra tricky to capture the existing behavior of SPARQL/DL query engines (such as Pellet and KAON2). Such behavior is not difficult to capture with the current framework (perhaps with some tweaking). ============== """Current specification: yes: blank nodes in the query pattern are scoped to a basic graph pattern. Their use in FILTERs is unclear.""" There is a real issue about how Filters work in general, and we're tackling it. BNodes are tricky in general because sometimes BNode label corefer and somtimes they don't (consider BNodes labels in two distinct RDF/XML document...and consider doing a union of the two documents rather than a merge!). I don't know any other query language that uses BNodes in query patterns (TRIPLE? does N3QL count? Ok, at least N3QL postdates the initial development of SPARQL). I've heard non-OWL people oppose BNodes in patterns, so it's very important to separate them out. BNodes in patterns are handy because they give us a syntactically distinct form of variable which allows us to have (semantically) more kinds of variable available. This hold across the specturm of queried languages. (see: <http://www.cs.man.ac.uk/~bparsia/2006/row-tutorial/#slide20> for the various sorts of variables possible. Most can be simulated using isIRI, isBlank functions) There are examples there. """Costs: Tableau-based reasoners (at least, the Pellet Demo example 7) rely on the current, more expressive semantics to match implications that are not in a materializable RDF graph."""" This is just false. Suppose there were no BNodes in query patterns. Then Pellet would do what it did before BNodes were in query patterns: query variables not in the select clause would be non- distinguished. Current behavior: Query variables in head (i.e., appearing in results): Distinguished (i.e., bound to names and appearing in results) Query variables in body-only: Projected away distinguished (i.e., bound to names but not appearing in results) BNodes in body: non-distinguished (i.e., bound to anything, not appearing in results) Unlike in the RDF case, no one implements (or has a clear sense how to specify, much less implement) semi-distinguished variables. BNodes are non-distinguished in all cases (rdf, and owl). If we dumped BNodes in queries, we'd change Pellet's behavior to: Query variables in head (i.e., appearing in results): Distinguished Query variables in body-only: Non-distinguished Which is traditional (and its old behavior,and RDQL behavior). Even putting aside (non)distinguished variables, subgraph matching of a materialized deductive closure (even an 'imaginary one') is not a good specification method. For one, no one has EVER done it for OWL (nor has anyone show it to be *correct* and complete for RDFS; it's all hand waving). There are good reasons to think that it won't work very well for OWL. That is, though I see a way to sort of do it, it will be brutally awful, complex, brittle, and unintelligible. And finally: """Benefit: would simplify the current semantics, which are difficult to specify and allow the expression of counter-intuitive queries.""" Clearly this means simplify the *presentation* of the current semantics, which isn't clear to me *at all*, especially when you connect the SPARQL spec to other specifications like RDF. But the problem with filters is not connected to the current semantic framework, at least inherently. And there's plenty of infelicity with the current algebra part of the spec (which we've been beating on), which has nothing to do with the core framework *per se*. see: http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jun/ 0008.html ============ On the working group side, I request that this pink box be removed from the current editors' draft. If the working group would like to publish a more detailed and accessible bit of text about the panoply of issues surrounding the core semantic framework, I shall be happy to provide one as suitable as I can make it for a general audience. I will, of course, present my take on these issues as we try to make decisions on them. SPARQL is not simple, even in the simple case. This is to my own surprise. Cheers, Bijan.
Received on Monday, 9 October 2006 09:11:39 UTC