Re: Draft response to: Re: major technical: blank nodes from Enrico Franconi on 2006-06-13 (public-rdf-dawg@w3.org from April to June 2006)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Tue, 13 Jun 2006 11:19:18 +0200
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Cc: Fred Zemke <fred.zemke@oracle.com>
Message-Id: <2FD61A32-B2F1-4C44-A21B-FEC981BEDCB6@inf.unibz.it>
Hi,
I agree with most of your comments.
As a matter of fact, I already pushed the WG to fix this (in the part  
which concerns FUB, namely 2.5), but this didn't happen: on 28 Feb  
2006 <http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/ 
0421> I said:

> I agree that we (and I mean the whole WG) forgot about the abstract  
> syntax while writing the definitions in rq23 2.5. So, whenever in  
> 2.5 we use "blank node names" or "blank node labels", we should  
> actually just use "blank node". I understand that everybody agrees  
> with this, so this should be done by Jos.

Apparently this never happened :-(
cheers
--e.


On 9 Jun 2006, at 18:39, Pat Hayes wrote:

>> This is a response to Pat Hayes's email in the archive
>> dated 26 Jan 2006 16:50:59 -0600.  Thank you for your
>> detailed comments.  They have been very helpful to me personally
>> in understanding the draft.
>> The reason for my reply is that I believe we can do a better job
>> in our treatment of blank nodes in SPARQL.
>
> I honestly don't think we can, given the many constraints we need  
> to satisfy. What we can do a better job of, however, is  
> *explaining* the treatment.
>
>>   I first came to an
>> earlier draft after reading the RDF Recommendations.  I found
>> the SPARQL draft very confusing and frustrating.  My essential
>> complaint was that SPARQL uses one term for two concepts:
>>
>> a) RDF blank nodes, which are nodes in a graph with no label, and
>>
>> b) SPARQL blank nodes, which are lexical tokens in a SPARQL query.
>>
>> Pat Hayes's email rejects this interpretation.
>
> Well, it wasn't the one I had in mind when we were writing the  
> spec, put it that way, and it wasn't my intention. Unfortunately I  
> can't speak for what was in anyone else's mind.
>
> BTW, my experience on SPARQL has led me to think that we didn't do  
> a good enough job of explaining the idea of blank nodes in the RDF  
> spec. You seem to have grokked it thoroughly.
>
>> However, let me give the reasons that I held it, based on my
>> reading of RDF and SPARQL both:
>>
>> a) According to the "RDF Concepts and abstract syntax"  
>> Recommendation,
>> section 6.6 "Blank nodes", the set of RDF blank nodes is distinct
>> from the set of IRIs and "Otherwise, this set of blank nodes
>> is arbitrary.  RDF makes no reference to any internal structure
>> of blank nodes".  That is, RDF blank nodes have no label.
>
> They have no label *in an RDF graph*. Other document conventions  
> might 'label' them in ways determined by the specs for those  
> documents.
>
>> b) The RDF Primer section 2.3 "Structured property values and blank
>> nodes" Figure 6 "Using a blank node" shows a blank node as having
>> no label.  It goes on to describe "blank node identifiers" of which
>> it says "...blank node identifiers are not considered to be actual
>> parts of the RDF graph."
>> c) In our own working draft Section 2.5.3 "Example of basic graph
>> pattern matching" second sentence under the first box, it says
>> "The label information is not in the graph."
>>
>> d) Section 2.8.3 "Blank nodes" says "Blank nodes have labels
>> which are scoped to the query".  However, RDF blank nodes have
>> no notion of scope (they simply exist, just as IRIs and literals
>> exist, with no notion of scope)
>
> Exactly. But look at that sentence that you quote: the LABEL is  
> scoped to the query, not the blank node. Scoping, as you say, is a  
> lexical matter.
>
>> .  Scope is a lexical concept
>> (the portion of a query text in which an identifier has a single
>> referent).
>>
>> My summary is that the consistent stance of the RDF Recommendations
>> is that blank node identifiers are an artefact of serialization.
>
> Exactly. We assumed it would be permissible to be allowed a slight  
> abuse of terminology, in using the serialization token to refer to  
> the blank node it is a token of (in the context of the document  
> under discussion: in the above case, the query document): after  
> all, that is what these tokens are FOR, to refer to blank nodes.  
> This kind of abuse of terminology is widely used and familiar, and  
> it avoids what would otherwise be rather tedious circumlocutions  
> like "the blank node whose token is", which is like saying "the  
> person whose name is Fred" rather than just saying "Fred".
>
> But given your confusion after what is clearly an extremely careful  
> reading, we should perhaps have been more pedantic, indeed.
>
>> Now if a reader comes to the SPARQL draft with that model, he
>> finds it very confusing (certainly I did).
>
> Apologies. SPARQL is indeed based on that model, and so a reader  
> making your voyage should find it more transparent.
>
>>  For example, section
>> 2.4 talks about how to extend a pattern solution S to graph
>> patterns.  It says "If v is not in the domain of S, then S(v)
>> is defined to be v."  Applied to SPARQL blank nodes such as
>> _:a, this says S(_:a) is _:a.  Fine; it is still a lexical
>> token; there has been no mention of creating a blank node
>> corresponding to the label _:a.
>
> Oh, but come come, surely now you are being a little TOO pedantic.  
> If a document claims to be using (near) RDF conventions and uses an  
> RDF blank node identifier syntax, surely it is not unreasonable to  
> presume that this is intended to indicate a blank node in an RDF  
> graph(-like) structure of which the document is a lexicalization.  
> We have earlier defined SPARQL patterns as RDF-graph-like things.  
> containing genuine blank nodes. True, we do not formally  
> distinguish patterns from their lexicalizations, but it seems clear  
> that the intention here is to continue and slightly extend the RDF  
> model to similar structures containing variables. No?
>
>>  As a result, the mapping of
>> a triple pattern, such as
>>
>>  ?x :v _:a
>>
>> is
>>
>>  (S(?x), :v, _:a)
>>
>> and there still is no RDF blank node.
>
> What else would _:a be considered to be a lexicalization of?
>
>> Consequently, the result
>> of the mapping is not an RDF triple.  Then we come to
>> setion 2.5.1 "General framework" and the definition of "basic
>> graph pattern E-matching".  This definition posits a basic
>> graph pattern BGP' and a scoping graph G' such that "G' and
>> BGP' do not share any blank node labels".
>
> Whoops. That shouldn't say 'label', indeed.
>
>>  But how can they?
>> BGP' is a triple pattern and might contain SPARQL blank nodes;
>> G' is an RDF graph and as such does not contain anything that
>> can be called a blank node label at all (though serializations of G'
>> might).
>>
>> After studying Pat Hayes's email, my conclusion is that the
>> text is using blank node identifiers as proxies or surrogates
>> for the blank nodes themselves.
>
> Yes, a fair diagnosis. That is exactly the 'abuse of notation' I  
> mentioned above.
>
>> To clarify our text, my proposed resolution is as follows:
>>
>> a) We should adopt the term "blank node identifier" for what I have
>> been calling SPARQL blank nodes.  This would harmonize with RDF
>> Recommendations, which use this term when talking about
>> character strings associated with blank nodes for identification
>> purposes.  For example, section 2.1.4 would be renamed "Syntax
>> for blank node identifiers".  We should scan the document for
>> other occurrences of "blank node", and, as appropriate, change to
>> "blank node identifier".
>
> Good idea.
>
>>
>> b) We state explicitly that for each distinct blank node identifier,
>> a distinct blank node is created for the purposes of processing
>> the query, different from any blank node in the graphs in the
>> query's dataset.
>
> Er...be careful. I don't think we should phrase this in terms of  
> *creation* of blank nodes; that's a bit like saying that when you  
> write a numeral, you create a number. Documents written using RDF  
> lexicalization conventions indicate RDF abstract graph structures  
> which might contain blank nodes: OK so far. The only questions that  
> we have to determine about blank nodes (the only question that can  
> be asked about them, in fact) has to do with their identity. If a  
> single document scope uses several occurrences of a bnode  
> identifier, then they identify the same bnode in whatever structure  
> is indicated by the document. Otherwise, all that SPARQL has to say  
> is when two bnodes are NOT the same, which is what the 'scoping  
> graph' definitions are all about.
>
>> We can also say that the reader may wish to
>> think of the blank node identifiers as proxies or surrogates for
>> these created blank nodes.
>
> Why do we need to say this? This is just part of the RDF graph  
> syntax/lexicalization model. At most, I think we might say  
> explicitly that SPARQL syntax is an extension of RDF syntax, and  
> inherits the RDF distinction between lexical scope of bnodeIDs, and  
> the actual occurrence of a bnode in an RDF graph.
>
>>  Perhaps this might go in Section 2.1.4.
>>
>> c) In section 2.1.8 "Result descriptions used in this document"
>> in the definition of RDF term, the created blank nodes  should be
>> explicitly listed as part of RDF-B.  (Note that even if one
>> believed that blank node identifiers were blank nodes all along,
>> this did not put them in RDF-B because they were not part of
>> any graph.)
>
> I agree we should be more explicit about the bnode/id distinction  
> here.
>
>>
>> d) In section 2.4 "Pattern solutions", definition of "pattern
>> solution", we say that the domain of S is extended to include
>> blank node identifiers by mapping each blank node identifier to
>> the blank node that was created for it in item b) above.
>
> Both the above are supposed to be handled by the 'scoping graph'  
> idea. The scoping graph's sole purpose is to be the source of  
> bnodes substituted for pattern variables in the answer document, to  
> allow this source to be something different from (but isomorphic  
> to) the source graph, and to be unique for each query. So rather  
> than 'creating' bnodes, SPARQL technically 'creates' a scoping  
> graph and then simply *uses* the bnodes in it. I guess it comes to  
> the same thing, but this way of talking about it makes sure that  
> the bnodes in the scoping graph fit into an isomorphic structure as  
> the target graph, so that the answer document is obliged to treat  
> these "bnodes from the (current bnode-substituted version of the)  
> target graph" in a way that makes sense across several answers. It  
> is hard to express this as a condition on bnodeIDs.
>
>> e) Somewhere we make the observation that the result of
>> applying a pattern solution S to a triple pattern is an RDf triple.
>> Thus if BGP is a basic graph pattern, then S(BGP)
>> is an RDF graph.
>>
>> f) delete the definition of "basic graph pattern equivalence"
>> (changes proposed below make it dispensable).
>>
>> g) delete the definition of "scoping graph", also unneeded.
>
> I think it (or something like it) is needed, see above.
>
>> h) Reword the definition of basic graph pattern matching
>> to use the notion of graph merge found in the RDF Recommendations.
>> The revised definition is something like this: "Given an
>> entailment regime E, a basic graph pattern BGP, an RDF graph
>> G and a pattern solution S whose range is a subset of B, then
>> BGP E-matches with pattern solution S on graph G with
>> respect to scoping set B if G E-entails the graph merge of G
>> and S(BGP)."  Actually, with the statement that the created
>> blank nodes are distinct from all blank nodes in the dataset,
>> a simple set union will suffice, though we may wish to stick
>> with the RDF notion of merge for consistency with RDF.
>
> It's not that simple, unfortunately. We went through a huge  
> discussion over this, and simply using merging doesn't cut it. The  
> scoping graph ideas was one result of this long discussion (which  
> is in the email record, should you wish to peruse it, though Im not  
> sure its a good idea :-)
>
>> i) If we want to keep the technique of renaming blank node
>> identifiers, we move that outside the boxed definition into
>> explanatory text.  For example, "The graph merge referred to
>> in the preceding definition can be thought of as using
>> blank node identifiers as proxies for the blank nodes.  In that
>> case, care must be taken to ensure that the blank node identifiers
>> of G are different from all blank node identifiers in BGP.
>> Let G' and BGP' be serializations of G and BGP, respectively,
>> such that all blank node identifiers in G' are different from
>> all blank node identifiers in BGP'.   Then G' UNION BGP'
>> is the serialization of some graph G2.  S is a solution for BGP using
>> E entailment if G E-entails G2."
>>
>> j) In section 2.5.2 "SPARQL basic graph pattern matching" last
>> paragraph, we can clarify that pattern solutions are unique,
>> not just unique up to blank node renaming.  The so-called "blank
>> node renaming" is an artefact of serialization.
>
> Well, strictly yes: but there is a similar notion which we might  
> call blank node substitution, and they aren't proof against that,  
> so in fact they aren't *unique*, strictly speaking. The point is  
> that blank nodes do have an identity, according to the RDF model,  
> so if one takes an RDF graph and a set of blank nodes which do not  
> occur in it, and substitutes these for the bnodes in the graph,  
> then you do have a *different* graph. Isomorphic, true: but  
> different, all the same. And this is not an artefact of  
> serialization, but is inherent in the idea that blank nodes can be  
> distinguished from one another, which if you think about it is  
> about all that can be done with them. This is why RDF needed to  
> distinguish between merging and taking a simple union when talking  
> about graphs (not serializations of a graph).
>
>>  The last sentence
>> is thus "the serialization of a set of all pattern solutions
>> is unique up to blank node identifiers".  We can also delete the  
>> phrase
>> "...possibly with blank nodes renamed" earlier in the paragraph,
>> because a pattern solution is not actually concerned with
>> assigning blank node identifiers.
>
> But I agree we should go through the text carefully and try to  
> remove as many traces as possible of the token/node ambiguity.  
> There is definitely a serious muddle in the current version of  
> 2.5.1., which is why I voted against it.
>
> Pat
>
>
>> Fred
>
>
> -- 
> ---------------------------------------------------------------------
> IHMC		(850)434 8903 or (650)494 3973   home
> 40 South Alcaniz St.	(850)202 4416   office
> Pensacola			(850)202 4440   fax
> FL 32502			(850)291 0667    cell
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
Received on Tuesday, 13 June 2006 09:19:36 UTC