- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Thu, 08 Jun 2006 22:30:38 +0000
- To: public-rdf-dawg@w3.org
This is a response to Pat Hayes's email in the archive dated 26 Jan 2006 16:50:59 -0600. Thank you for your detailed comments. They have been very helpful to me personally in understanding the draft. The reason for my reply is that I believe we can do a better job in our treatment of blank nodes in SPARQL. I first came to an earlier draft after reading the RDF Recommendations. I found the SPARQL draft very confusing and frustrating. My essential complaint was that SPARQL uses one term for two concepts: a) RDF blank nodes, which are nodes in a graph with no label, and b) SPARQL blank nodes, which are lexical tokens in a SPARQL query. Pat Hayes's email rejects this interpretation. However, let me give the reasons that I held it, based on my reading of RDF and SPARQL both: a) According to the "RDF Concepts and abstract syntax" Recommendation, section 6.6 "Blank nodes", the set of RDF blank nodes is distinct from the set of IRIs and "Otherwise, this set of blank nodes is arbitrary. RDF makes no reference to any internal structure of blank nodes". That is, RDF blank nodes have no label. b) The RDF Primer section 2.3 "Structured property values and blank nodes" Figure 6 "Using a blank node" shows a blank node as having no label. It goes on to describe "blank node identifiers" of which it says "...blank node identifiers are not considered to be actual parts of the RDF graph." c) In our own working draft Section 2.5.3 "Example of basic graph pattern matching" second sentence under the first box, it says "The label information is not in the graph." d) Section 2.8.3 "Blank nodes" says "Blank nodes have labels which are scoped to the query". However, RDF blank nodes have no notion of scope (they simply exist, just as IRIs and literals exist, with no notion of scope). Scope is a lexical concept (the portion of a query text in which an identifier has a single referent). My summary is that the consistent stance of the RDF Recommendations is that blank node identifiers are an artefact of serialization. Now if a reader comes to the SPARQL draft with that model, he finds it very confusing (certainly I did). For example, section 2.4 talks about how to extend a pattern solution S to graph patterns. It says "If v is not in the domain of S, then S(v) is defined to be v." Applied to SPARQL blank nodes such as _:a, this says S(_:a) is _:a. Fine; it is still a lexical token; there has been no mention of creating a blank node corresponding to the label _:a. As a result, the mapping of a triple pattern, such as ?x :v _:a is (S(?x), :v, _:a) and there still is no RDF blank node. Consequently, the result of the mapping is not an RDF triple. Then we come to setion 2.5.1 "General framework" and the definition of "basic graph pattern E-matching". This definition posits a basic graph pattern BGP' and a scoping graph G' such that "G' and BGP' do not share any blank node labels". But how can they? BGP' is a triple pattern and might contain SPARQL blank nodes; G' is an RDF graph and as such does not contain anything that can be called a blank node label at all (though serializations of G' might). After studying Pat Hayes's email, my conclusion is that the text is using blank node identifiers as proxies or surrogates for the blank nodes themselves. To clarify our text, my proposed resolution is as follows: a) We should adopt the term "blank node identifier" for what I have been calling SPARQL blank nodes. This would harmonize with RDF Recommendations, which use this term when talking about character strings associated with blank nodes for identification purposes. For example, section 2.1.4 would be renamed "Syntax for blank node identifiers". We should scan the document for other occurrences of "blank node", and, as appropriate, change to "blank node identifier". b) We state explicitly that for each distinct blank node identifier, a distinct blank node is created for the purposes of processing the query, different from any blank node in the graphs in the query's dataset. We can also say that the reader may wish to think of the blank node identifiers as proxies or surrogates for these created blank nodes. Perhaps this might go in Section 2.1.4. c) In section 2.1.8 "Result descriptions used in this document" in the definition of RDF term, the created blank nodes should be explicitly listed as part of RDF-B. (Note that even if one believed that blank node identifiers were blank nodes all along, this did not put them in RDF-B because they were not part of any graph.) d) In section 2.4 "Pattern solutions", definition of "pattern solution", we say that the domain of S is extended to include blank node identifiers by mapping each blank node identifier to the blank node that was created for it in item b) above. e) Somewhere we make the observation that the result of applying a pattern solution S to a triple pattern is an RDf triple. Thus if BGP is a basic graph pattern, then S(BGP) is an RDF graph. f) delete the definition of "basic graph pattern equivalence" (changes proposed below make it dispensable). g) delete the definition of "scoping graph", also unneeded. h) Reword the definition of basic graph pattern matching to use the notion of graph merge found in the RDF Recommendations. The revised definition is something like this: "Given an entailment regime E, a basic graph pattern BGP, an RDF graph G and a pattern solution S whose range is a subset of B, then BGP E-matches with pattern solution S on graph G with respect to scoping set B if G E-entails the graph merge of G and S(BGP)." Actually, with the statement that the created blank nodes are distinct from all blank nodes in the dataset, a simple set union will suffice, though we may wish to stick with the RDF notion of merge for consistency with RDF. i) If we want to keep the technique of renaming blank node identifiers, we move that outside the boxed definition into explanatory text. For example, "The graph merge referred to in the preceding definition can be thought of as using blank node identifiers as proxies for the blank nodes. In that case, care must be taken to ensure that the blank node identifiers of G are different from all blank node identifiers in BGP. Let G' and BGP' be serializations of G and BGP, respectively, such that all blank node identifiers in G' are different from all blank node identifiers in BGP'. Then G' UNION BGP' is the serialization of some graph G2. S is a solution for BGP using E entailment if G E-entails G2." j) In section 2.5.2 "SPARQL basic graph pattern matching" last paragraph, we can clarify that pattern solutions are unique, not just unique up to blank node renaming. The so-called "blank node renaming" is an artefact of serialization. The last sentence is thus "the serialization of a set of all pattern solutions is unique up to blank node identifiers". We can also delete the phrase "...possibly with blank nodes renamed" earlier in the paragraph, because a pattern solution is not actually concerned with assigning blank node identifiers. Fred
Received on Thursday, 8 June 2006 23:32:01 UTC