Re: review of LCC documents as of 26 December 2002 from Dave Beckett on 2003-01-07 (www-rdf-comments@w3.org from January to March 2003)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Tue, 07 Jan 2003 12:40:53 +0000
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
cc: www-rdf-comments@w3.org
Message-ID: <23130.1041943253@hoth.ilrt.bris.ac.uk>
>>>"Peter F. Patel-Schneider" said:
> 
> Integrated Review of the RDF Core WG LCC Documents (as of 26 December 2002)
> 
> 
> This review is the result of reading the RDF Core WG LCC Documents as they
> existed on 26 December 2002.  

<snip/> 


> RDF/XML Syntax Specification (Revised) Editor's version of W3C Working
> Draft XX Month YYYY
> 
> Section 2: 
> 
> It would be better to state up front that this section is non-normative,
> instead of making it subordinate to sections 6 and 7.  In particular, what
> happens if section 6 and 7 are silent on some point?  Does this make this
> section normative? 

This section has useful examples that are correct rdf/xml as well as
explaining the ideas that the grammar uses such as node element,
property element, etc.  If there are specific things 6 & 7 are silent
on that you need clarifying, please list them.

> Section 5: 
> 
> What does it mean for a namespace to contain a set of names?  How is this
> regulated in RDF?  Can the owner of any namespace close off the namespace?
> How can this be done in RDF?  Without answers to these questions, saying
> that the RDF namespace contains only a certain set of names doesn't make
> sense.

  [Definition:] An XML namespace is a collection of names, identified
  by a URI reference [RFC2396]

  -- http://www.w3.org/TR/1999/REC-xml-names-19990114/

The RDF namespace given here is just such an XML Namespace - a
collection of names identified by a URI reference.  Section 5.1 tells
you what the names are by listing them along with a rule for the
constructed ones of the _n form and gives you the URI reference.

The other questions are best directed at what XML namespaces are
about since the RDF model does not have a concept of namespaces.


> The container property names are not of the form _n where n is a positive
> integer, they are of the form _n where n is a  base-10 numeral without
> leading zeros that represents a positive integer. 

Not just positive; 0 is forbidden.  How about
  "where n is a decimal integer greater than zero with no leading zeros"

> The statement that other names from the RDF namespace can be used goes
> against the idea of a closed RDF namespace and also against certain
> comments from the RDF Core WG.  However, it appears to be much closer to
> historical truth than these other statements.  

The WG considered that new names have been added to the RDF Namespace
since RDF M&S, and older systems wouldn't expect them (see the 5.1
Notes).  The WG decided that unrecognised names would be accepted but
applications should warn that they are unknown to this specification.
Any such names will likely match property element and property
attribute productions in the grammar rules and then appear in triples.


> The method for ensuring that there are no clashes between generated blank
> node identifiers and blank node identifiers made from rdf:nodeID attribute
> values requires a complete pass over the document before any blank node
> identifier can be generated.  This is the case because these two sets use
> the same set of identifiers and any element of this set can be made from an
> rdf:nodeID attribute value.  This problem has been pointed out before but
> has not yet been fixed.  The only change has been to add a resolution
> method to this section that is not actually allowable from the grammar
> rules in Section 7. 

It has been noted and fixed.  

Generated blank node identifiers are done by the
generated-blank-node-id() notation which says:
   "A string value for a new distinct generated Blank Node
   Identifiers as defined in section 5.2 Identifiers." 

Blank node identifiers from rdf:nodeID are done by the bnodeid()
notation which says:
  "bnodeid(identifier := value) Create a new Blank Node Identifier Event."
which refers to section 5.2

5.2 tells you that you do not have to use the exact blank node
identifier given but can use any method that retains the blank node
identity.

The suggested method given here is to "add a constant prefix to all
the rdf:nodeID attribute values and ensure no generated blank node
identifiers ever used that prefix." but that is not required.  For
example, generate-blank-node-id() could make all names "genid"+number
and bnodeid() could apply some different constant prefix.

The more expensive alternative you give would be to keep all blank
node identifiers around and check for clashes.  There are medium-cost
alternatives too, such renaming any rdf:nodeID values that start
with "genid" to a new identifier.

[more below]


> Section 6: 
> 
> Referring to the next version of a working draft in a version that is
> supposed to be a last call candidate is not appropriate.  If this becomes a
> last call working draft then it would be a reason to delay further progress
> along the recommendation track. 

That is in section 6.1.4 and those words were intended to be removed
and will be for this LCWD.

> Section 7: 
> 
> Basing the grammar on an event model makes it more complex that would be
> the case if it was based on a tree model such as the Xquery data model. 

The "Xquery data model" you refer to I assume is the 
  XQuery 1.0/XPath 2.0 data model
  http://www.w3.org/TR/query-datamodel/
which is still in Working Draft form after the first draft in May 2000
and has several open issues, so not completely ready for use yet.

This XML syntax WD has been using this very simple infoset-based data
model since December 2001 
  http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20011218/
primarily because it very closely matches the way that most RDF/XML
parsers have been and are written, using the streaming SAX-style
event model.  It is trivial to consider this as a mapping over the
DOM tree model.  New parsers have been written matching this model
closely and successfully.

> The grammar here does not forbid unused terms from the rdf: or rdfs:
> namespaces. 

Namespaces are to do with XML, not RDF which deals only with
URI-references.  So the RDF namespace checking can only apply when
the XML infoset items are turned into data model events in section 6.

I agree there needs to be some pointer added to 6.1.2 Element and
6.1.4 Attribute Events.  I will add something like:
  "If [namespace-name] is the RDF Namespace URI, then [local-name]
   must match the conditions given in 5.1 The RDF Namespace."

which as you note above, amounts to a warning (and that includes
such things as rdf:_0 rdf:_01)

The removed names aboutEach and aboutEachPrefix are forbidden.

These checks are recorded as test cases:
  http://www.w3.org/2000/10/rdf-tests/rdfcore/rdfms-rdf-names-use/


> The grammar here explicitly states that the subject of a nodeElement is
> formed from the attribute value of any rdf:nodeID attribute in the same way
> that the subject is formed from a generated blank node id.  This does not
> allow for the trick of using a prefix on the subject in these cases so as
> to prevent clashes with generated blank node ids. 

As already noted above, bnodeid() and generate-blank-node-id() allow
the identifiers to be modified by refering to 5.2.  The subject is
always generated by using the string-value accessor of the
appropriate event, which can perform any modification that 5.2
allows.

I could add something further to 6.1.7 to indicate that the
string-value may be further modified.


> Section 8: 
> 
> This section explicitly admits that RDF graphs can use any URI from the rdf
> and rdfs namespaces. 

Already allowed; but will give warnings
 
> The last sentence of the section is wrong, and, moreover, contradicts the
> rest of the section. 

The sentence is not wrong but may be misleading.  Jeremy Carroll, the
author of the paper proposes changing to:
  "This describes using the original syntax without the subsequently
  added rdf:nodeID attribute."

As far as I know it remains true that all legal RDF graphs (no errors
or warnings) with blank nodes can be serialized to RDF/XML using
rdf:nodeID, since anywhere you can use a resource URI (rdf:about or
rdf:resource) you can now use a blank node identifier.  If you have a
counter example, I'd be interested to see it.


> Appendix A: 
> 
> The comment about not being able to forbid things starting with _ is
> misleading.  The problem is that it is not possible to forbid things of the
> form _n where n is a decimal numeral without leading zeros that represents
> a positive integer.  The Schema here is incomplete in other ways, including
> allowing multiple rdf:ID with the same value. 

The rule turning rdf:_n into a URI is applied BEFORE the grammar
section 6 so that is why it doesn't appear there.  Translating that
restriction into RELAX NG means trying to add it to the grammar.

The problem is that RELAX NG Compact (RELAX NG too) only has rdf:* so
it isn't possible to try to forbid/warn rdf:_0, rdf:_0[1-9]*,
rdf:_foo etc., only a list of explict names that are forbidden.
The comment could be updated to make this clear.

The rdf:ID restrictions are caught because rdf:ID is defined here of
type xsd:NMTOKEN with the restrictions that implies.  I'm not sure it
is totally correct since it might not deal with xml:base.

Which is why this section is for information only and non-normative
and clearly states:

  [[Note: The RNGC schema has been updated to attempt to match the
  grammar but this has not been checked or used to validate RDF/XML.]]

I have considered moving it to a separate document under RDF Core
control (or a W3C Note) but it still seems quite useful here.

Dave
Received on Tuesday, 7 January 2003 07:43:18 UTC