- From: Enrico Franconi <franconi@inf.unibz.it>
- Date: Tue, 13 Jun 2006 11:19:18 +0200
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
- Cc: Fred Zemke <fred.zemke@oracle.com>
Hi, I agree with most of your comments. As a matter of fact, I already pushed the WG to fix this (in the part which concerns FUB, namely 2.5), but this didn't happen: on 28 Feb 2006 <http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/ 0421> I said: > I agree that we (and I mean the whole WG) forgot about the abstract > syntax while writing the definitions in rq23 2.5. So, whenever in > 2.5 we use "blank node names" or "blank node labels", we should > actually just use "blank node". I understand that everybody agrees > with this, so this should be done by Jos. Apparently this never happened :-( cheers --e. On 9 Jun 2006, at 18:39, Pat Hayes wrote: >> This is a response to Pat Hayes's email in the archive >> dated 26 Jan 2006 16:50:59 -0600. Thank you for your >> detailed comments. They have been very helpful to me personally >> in understanding the draft. >> The reason for my reply is that I believe we can do a better job >> in our treatment of blank nodes in SPARQL. > > I honestly don't think we can, given the many constraints we need > to satisfy. What we can do a better job of, however, is > *explaining* the treatment. > >> I first came to an >> earlier draft after reading the RDF Recommendations. I found >> the SPARQL draft very confusing and frustrating. My essential >> complaint was that SPARQL uses one term for two concepts: >> >> a) RDF blank nodes, which are nodes in a graph with no label, and >> >> b) SPARQL blank nodes, which are lexical tokens in a SPARQL query. >> >> Pat Hayes's email rejects this interpretation. > > Well, it wasn't the one I had in mind when we were writing the > spec, put it that way, and it wasn't my intention. Unfortunately I > can't speak for what was in anyone else's mind. > > BTW, my experience on SPARQL has led me to think that we didn't do > a good enough job of explaining the idea of blank nodes in the RDF > spec. You seem to have grokked it thoroughly. > >> However, let me give the reasons that I held it, based on my >> reading of RDF and SPARQL both: >> >> a) According to the "RDF Concepts and abstract syntax" >> Recommendation, >> section 6.6 "Blank nodes", the set of RDF blank nodes is distinct >> from the set of IRIs and "Otherwise, this set of blank nodes >> is arbitrary. RDF makes no reference to any internal structure >> of blank nodes". That is, RDF blank nodes have no label. > > They have no label *in an RDF graph*. Other document conventions > might 'label' them in ways determined by the specs for those > documents. > >> b) The RDF Primer section 2.3 "Structured property values and blank >> nodes" Figure 6 "Using a blank node" shows a blank node as having >> no label. It goes on to describe "blank node identifiers" of which >> it says "...blank node identifiers are not considered to be actual >> parts of the RDF graph." >> c) In our own working draft Section 2.5.3 "Example of basic graph >> pattern matching" second sentence under the first box, it says >> "The label information is not in the graph." >> >> d) Section 2.8.3 "Blank nodes" says "Blank nodes have labels >> which are scoped to the query". However, RDF blank nodes have >> no notion of scope (they simply exist, just as IRIs and literals >> exist, with no notion of scope) > > Exactly. But look at that sentence that you quote: the LABEL is > scoped to the query, not the blank node. Scoping, as you say, is a > lexical matter. > >> . Scope is a lexical concept >> (the portion of a query text in which an identifier has a single >> referent). >> >> My summary is that the consistent stance of the RDF Recommendations >> is that blank node identifiers are an artefact of serialization. > > Exactly. We assumed it would be permissible to be allowed a slight > abuse of terminology, in using the serialization token to refer to > the blank node it is a token of (in the context of the document > under discussion: in the above case, the query document): after > all, that is what these tokens are FOR, to refer to blank nodes. > This kind of abuse of terminology is widely used and familiar, and > it avoids what would otherwise be rather tedious circumlocutions > like "the blank node whose token is", which is like saying "the > person whose name is Fred" rather than just saying "Fred". > > But given your confusion after what is clearly an extremely careful > reading, we should perhaps have been more pedantic, indeed. > >> Now if a reader comes to the SPARQL draft with that model, he >> finds it very confusing (certainly I did). > > Apologies. SPARQL is indeed based on that model, and so a reader > making your voyage should find it more transparent. > >> For example, section >> 2.4 talks about how to extend a pattern solution S to graph >> patterns. It says "If v is not in the domain of S, then S(v) >> is defined to be v." Applied to SPARQL blank nodes such as >> _:a, this says S(_:a) is _:a. Fine; it is still a lexical >> token; there has been no mention of creating a blank node >> corresponding to the label _:a. > > Oh, but come come, surely now you are being a little TOO pedantic. > If a document claims to be using (near) RDF conventions and uses an > RDF blank node identifier syntax, surely it is not unreasonable to > presume that this is intended to indicate a blank node in an RDF > graph(-like) structure of which the document is a lexicalization. > We have earlier defined SPARQL patterns as RDF-graph-like things. > containing genuine blank nodes. True, we do not formally > distinguish patterns from their lexicalizations, but it seems clear > that the intention here is to continue and slightly extend the RDF > model to similar structures containing variables. No? > >> As a result, the mapping of >> a triple pattern, such as >> >> ?x :v _:a >> >> is >> >> (S(?x), :v, _:a) >> >> and there still is no RDF blank node. > > What else would _:a be considered to be a lexicalization of? > >> Consequently, the result >> of the mapping is not an RDF triple. Then we come to >> setion 2.5.1 "General framework" and the definition of "basic >> graph pattern E-matching". This definition posits a basic >> graph pattern BGP' and a scoping graph G' such that "G' and >> BGP' do not share any blank node labels". > > Whoops. That shouldn't say 'label', indeed. > >> But how can they? >> BGP' is a triple pattern and might contain SPARQL blank nodes; >> G' is an RDF graph and as such does not contain anything that >> can be called a blank node label at all (though serializations of G' >> might). >> >> After studying Pat Hayes's email, my conclusion is that the >> text is using blank node identifiers as proxies or surrogates >> for the blank nodes themselves. > > Yes, a fair diagnosis. That is exactly the 'abuse of notation' I > mentioned above. > >> To clarify our text, my proposed resolution is as follows: >> >> a) We should adopt the term "blank node identifier" for what I have >> been calling SPARQL blank nodes. This would harmonize with RDF >> Recommendations, which use this term when talking about >> character strings associated with blank nodes for identification >> purposes. For example, section 2.1.4 would be renamed "Syntax >> for blank node identifiers". We should scan the document for >> other occurrences of "blank node", and, as appropriate, change to >> "blank node identifier". > > Good idea. > >> >> b) We state explicitly that for each distinct blank node identifier, >> a distinct blank node is created for the purposes of processing >> the query, different from any blank node in the graphs in the >> query's dataset. > > Er...be careful. I don't think we should phrase this in terms of > *creation* of blank nodes; that's a bit like saying that when you > write a numeral, you create a number. Documents written using RDF > lexicalization conventions indicate RDF abstract graph structures > which might contain blank nodes: OK so far. The only questions that > we have to determine about blank nodes (the only question that can > be asked about them, in fact) has to do with their identity. If a > single document scope uses several occurrences of a bnode > identifier, then they identify the same bnode in whatever structure > is indicated by the document. Otherwise, all that SPARQL has to say > is when two bnodes are NOT the same, which is what the 'scoping > graph' definitions are all about. > >> We can also say that the reader may wish to >> think of the blank node identifiers as proxies or surrogates for >> these created blank nodes. > > Why do we need to say this? This is just part of the RDF graph > syntax/lexicalization model. At most, I think we might say > explicitly that SPARQL syntax is an extension of RDF syntax, and > inherits the RDF distinction between lexical scope of bnodeIDs, and > the actual occurrence of a bnode in an RDF graph. > >> Perhaps this might go in Section 2.1.4. >> >> c) In section 2.1.8 "Result descriptions used in this document" >> in the definition of RDF term, the created blank nodes should be >> explicitly listed as part of RDF-B. (Note that even if one >> believed that blank node identifiers were blank nodes all along, >> this did not put them in RDF-B because they were not part of >> any graph.) > > I agree we should be more explicit about the bnode/id distinction > here. > >> >> d) In section 2.4 "Pattern solutions", definition of "pattern >> solution", we say that the domain of S is extended to include >> blank node identifiers by mapping each blank node identifier to >> the blank node that was created for it in item b) above. > > Both the above are supposed to be handled by the 'scoping graph' > idea. The scoping graph's sole purpose is to be the source of > bnodes substituted for pattern variables in the answer document, to > allow this source to be something different from (but isomorphic > to) the source graph, and to be unique for each query. So rather > than 'creating' bnodes, SPARQL technically 'creates' a scoping > graph and then simply *uses* the bnodes in it. I guess it comes to > the same thing, but this way of talking about it makes sure that > the bnodes in the scoping graph fit into an isomorphic structure as > the target graph, so that the answer document is obliged to treat > these "bnodes from the (current bnode-substituted version of the) > target graph" in a way that makes sense across several answers. It > is hard to express this as a condition on bnodeIDs. > >> e) Somewhere we make the observation that the result of >> applying a pattern solution S to a triple pattern is an RDf triple. >> Thus if BGP is a basic graph pattern, then S(BGP) >> is an RDF graph. >> >> f) delete the definition of "basic graph pattern equivalence" >> (changes proposed below make it dispensable). >> >> g) delete the definition of "scoping graph", also unneeded. > > I think it (or something like it) is needed, see above. > >> h) Reword the definition of basic graph pattern matching >> to use the notion of graph merge found in the RDF Recommendations. >> The revised definition is something like this: "Given an >> entailment regime E, a basic graph pattern BGP, an RDF graph >> G and a pattern solution S whose range is a subset of B, then >> BGP E-matches with pattern solution S on graph G with >> respect to scoping set B if G E-entails the graph merge of G >> and S(BGP)." Actually, with the statement that the created >> blank nodes are distinct from all blank nodes in the dataset, >> a simple set union will suffice, though we may wish to stick >> with the RDF notion of merge for consistency with RDF. > > It's not that simple, unfortunately. We went through a huge > discussion over this, and simply using merging doesn't cut it. The > scoping graph ideas was one result of this long discussion (which > is in the email record, should you wish to peruse it, though Im not > sure its a good idea :-) > >> i) If we want to keep the technique of renaming blank node >> identifiers, we move that outside the boxed definition into >> explanatory text. For example, "The graph merge referred to >> in the preceding definition can be thought of as using >> blank node identifiers as proxies for the blank nodes. In that >> case, care must be taken to ensure that the blank node identifiers >> of G are different from all blank node identifiers in BGP. >> Let G' and BGP' be serializations of G and BGP, respectively, >> such that all blank node identifiers in G' are different from >> all blank node identifiers in BGP'. Then G' UNION BGP' >> is the serialization of some graph G2. S is a solution for BGP using >> E entailment if G E-entails G2." >> >> j) In section 2.5.2 "SPARQL basic graph pattern matching" last >> paragraph, we can clarify that pattern solutions are unique, >> not just unique up to blank node renaming. The so-called "blank >> node renaming" is an artefact of serialization. > > Well, strictly yes: but there is a similar notion which we might > call blank node substitution, and they aren't proof against that, > so in fact they aren't *unique*, strictly speaking. The point is > that blank nodes do have an identity, according to the RDF model, > so if one takes an RDF graph and a set of blank nodes which do not > occur in it, and substitutes these for the bnodes in the graph, > then you do have a *different* graph. Isomorphic, true: but > different, all the same. And this is not an artefact of > serialization, but is inherent in the idea that blank nodes can be > distinguished from one another, which if you think about it is > about all that can be done with them. This is why RDF needed to > distinguish between merging and taking a simple union when talking > about graphs (not serializations of a graph). > >> The last sentence >> is thus "the serialization of a set of all pattern solutions >> is unique up to blank node identifiers". We can also delete the >> phrase >> "...possibly with blank nodes renamed" earlier in the paragraph, >> because a pattern solution is not actually concerned with >> assigning blank node identifiers. > > But I agree we should go through the text carefully and try to > remove as many traces as possible of the token/node ambiguity. > There is definitely a serious muddle in the current version of > 2.5.1., which is why I voted against it. > > Pat > > >> Fred > > > -- > --------------------------------------------------------------------- > IHMC (850)434 8903 or (650)494 3973 home > 40 South Alcaniz St. (850)202 4416 office > Pensacola (850)202 4440 fax > FL 32502 (850)291 0667 cell > phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes > >
Received on Tuesday, 13 June 2006 09:19:36 UTC