Re: review of LCC documents as of 26 December 2002 from Peter F. Patel-Schneider on 2003-01-14 (www-rdf-comments@w3.org from January to March 2003)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Tue, 14 Jan 2003 10:41:21 -0500 (EST)
To: dave.beckett@bristol.ac.uk
Cc: www-rdf-comments@w3.org
Message-Id: <20030114.104121.123581652.pfps@research.bell-labs.com>
From: Dave Beckett <dave.beckett@bristol.ac.uk>
Subject: Re: review of LCC documents as of 26 December 2002 
Date: Tue, 07 Jan 2003 12:40:53 +0000

> >>>"Peter F. Patel-Schneider" said:
> > 
> > Integrated Review of the RDF Core WG LCC Documents (as of 26 December 2002)
> > 
> > 
> > This review is the result of reading the RDF Core WG LCC Documents as they
> > existed on 26 December 2002.  

[...]
 
> > The method for ensuring that there are no clashes between generated blank
> > node identifiers and blank node identifiers made from rdf:nodeID attribute
> > values requires a complete pass over the document before any blank node
> > identifier can be generated.  This is the case because these two sets use
> > the same set of identifiers and any element of this set can be made from an
> > rdf:nodeID attribute value.  This problem has been pointed out before but
> > has not yet been fixed.  The only change has been to add a resolution
> > method to this section that is not actually allowable from the grammar
> > rules in Section 7. 
> 
> It has been noted and fixed.  

I disagree.  See below for an extensive comment.

> Generated blank node identifiers are done by the
> generated-blank-node-id() notation which says:
>    "A string value for a new distinct generated Blank Node
>    Identifiers as defined in section 5.2 Identifiers." 

Aside from being grammatically incorrect, this statement is extremely
difficult to understand, and even incorrect.  I don't read any requirement
here that different ``calls'' to this action must result in different
strings being returned.  It would be much better to explicitly state that
generated-blank-node-id() returns a different string each time that it is
called.

> Blank node identifiers from rdf:nodeID are done by the bnodeid()
> notation which says:
>   "bnodeid(identifier := value) Create a new Blank Node Identifier Event."
> which refers to section 5.2
> 
> 5.2 tells you that you do not have to use the exact blank node
> identifier given but can use any method that retains the blank node
> identity.

Well part of the problem is that there are no blank nodes in XML/RDF, so
you can't rely on preserving blank node identity (really distinctness)
to get you want you want.

> The suggested method given here is to "add a constant prefix to all
> the rdf:nodeID attribute values and ensure no generated blank node
> identifiers ever used that prefix." but that is not required.  For
> example, generate-blank-node-id() could make all names "genid"+number
> and bnodeid() could apply some different constant prefix.

This method is not allowed.  See below for more details.

> The more expensive alternative you give would be to keep all blank
> node identifiers around and check for clashes.  There are medium-cost
> alternatives too, such renaming any rdf:nodeID values that start
> with "genid" to a new identifier.

This last method is also not allowed.




		Problems with Blank Nodes and rdf:nodeID
		in the LCC XML/RDF Syntax Document

The handling of blank nodes is still problematic in the LCC version of the
XML/RDF document.  

The intent is clear.  Each nodeElement that does not otherwise get a
subject is given a blank node identifier as a subject.  The string-value of
this blank node identifer is to be different from the string-value of every
other blank node identifier resulting from the parsing of the RDF/XML
document.

However, the document does not follow this intent.  

First, in section 5.2, the document only says that ``generated blank node
identifiers must not clash with any blank node identifiers from rdf:nodeID
attribute values.''  This allows

<rdf:RDF xmlns:rdf="..."
         xmlns:ex="...">

<rdf:Description>
  <ex:foo>
   <rdf:Description />
  </ex:foo>
</rdf:Description>

</rdf:RDF>

to generate the following triple

_:x <ex:foo> _:x .


Second, a blank node identifier in the linear representation of an RDF
Graph is generated from the string-value of the subject of the event.  For
events that come from nodeElements that have an rdf:nodeID attribute, this
value is determined in 7.2.11 as follows
	If there is an atribute a with a.URI=rdf:nodeID,
	then e.subject := bnodeid(identifier:=a.string-value)
From 6.1.7 the string value of this subject is the concatenation of "_:"
and the value its identifier accessor.

This means that 

<rdf:RDF xmlns:rdf="..."
         xmlns:ex="...">

<rdf:Description rdf:nodeID="HI">
  <ex:foo>
   <rdf:Description rdf:nodeID="BYE" />
  </ex:foo>
</rdf:Description>

</rdf:RDF>

MUST generate the following triple

_:HI <ex:foo> _:BYE .

Therefore the wording in 5.2 ``One method would be to add a constant prefix
to all the rdf:nodeID attribute values'' is not a potential solution to the
blank node identifier clashing problem.




Peter F. Patel-Schneider
Bell Labs Research
Lucent Technologies
Received on Tuesday, 14 January 2003 10:41:34 UTC