Review of XQuery/XPath2 data model from Richard Tobin on 2005-04-07 (public-xml-core-wg@w3.org from April 2005)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Thu, 7 Apr 2005 14:15:23 +0100 (BST)
To: public-xml-core-wg@w3.org
Message-Id: <20050407131523.0B4B32884F3@macintosh.inf.ed.ac.uk>
Here are my comments on the data model document.  The document seems to
be in much better shape that last times we reviewed it, and most of the
points I raise are minor.

It would probably be best for Norm to look at them and tell us what to
do next.  I see they are using Bugzilla for comments - should we have
a Bugzilla account for the Core WG's official comments?

-- Richard

2.1, sentence beginning "Every node is one of"
"dm:namespaces" should presumably be "dm:namespace-nodes", and a link.

2.1, definition of expanded-QName
(typo) "empyt" should be "empty".

2.6.1
Can an instance of the data model contain documents from the same
namespace that have been validated with different schemas?  In that
case, there may be different types with the same expanded-QName.

2.6.2, xdt:anyAtomicType
xdt:anyAtomicType "is derived from" xs:anySimpleType, but how?
Trivial restriction?  And if xs:string (etc) is derived from
xdt:anyAtomicType, does that mean its {base type definition} is no
longer xs:anySimpleType?  (I understand that the Schema group is
addressing this, but something needs to be said about it here.)

2.6.2, xdt:dayTimeDuration
"six lines" should be "nine lines".

3, sentence beginning "This document describes" (and elsewhere)
(style) Use "an infoset ([Infoset])" the first time and "an infoset"
thereafter, not "an [Infoset]".

3.3.1.3
This section explains what is meant by "consistent with schema 
validation".  It would have been useful to have a link to it when
the term was first used.  And in 2.6.4 the phrase "consistent with
validation" (not "schema validation") was used.

5.5, 5.6
"is an XML ID" is unclear.  Does it mean "has a value of type ID", or
"is an attribute with [attribute type] ID", or what?  Likewise for "is
an XML IDREF or IDREFS".  After reading section 6, it becomes clear
that this depends on what the data model is constructed from, but a
hint here would be useful.

5.12
Why is "parent" in bold?  Is it supposed to mean the parent property?
If so, it is inconsistent with the rest of section 5 which is giving
plain-English descriptions of the accessors, rather than defining them
in terms of node properties.  Contrast 5.1 and 5.3 which don't have
"attributes" or "children" in bold.

6.1.1, constraint 1
(typo) "namespace" should be capitalized.

6.1.5
The [declaration base URI] property should not necessarily be the
document-uri.  For an unparsed entity declared in the external subset,
it should be the URI of the external subset.  It seems that you need a
property in the data model to store this, if you want to get it right.

6.2.1
The namespaces property is described as "possibly empty", but as noted in
constraint 13 it must contain a binding for the XML namespace.

6.2.3
This section is supposed to be about constructing from a plain
infoset, but the attributes and namespaces items refer to things that
only arise for a PSVI: attributes defaulted by schema processing,
values of type xs:QName.  Perhaps these should be moved to 6.2.4.

6.2.3, parent property
What should the value of the parent property be if [parent] is unknown
or has no value?  Obviously this can't happen with the infoset of an
XML document, but it can with an infoset created by (for example) 
converting an XPath data model (as in 6.2.5).  Similarly for attributes
in 6.3.3.

6.2.3, attributes property
"attributes's" should probably be just "attributes".

6.2.4, is-id and is-idrefs
What about types derived from xs:ID etc?  Similarly for attributes 
in 6.3.4.

6.2.5, [attributes]
The [attributes] property is an unordered set, not a list.


6.4.2, dm:node-name
"If the prefix is available" - when could it not be available?  Do you
mean if it has a value (i.e. the binding is not for the default
namespace)?

6.5.3, target
Why do you consider the possibility that [target] not be an NCName?
This is required by the Namespaces spec.  One might be able to create
a synthetic infoset in which it were not true, but equally you could
create one with an element whose [local name] was not an NCName, and
you don't consider that case.

6.5.5, 6.6.5, [parent]
Why is the case of being the root considered for comments but not
processing instructions?

6.7.1
"When a Document or Element Node is constructed, Text Nodes that would
be adjacent are combined into a single Text Node. If the resulting
Text Node is empty, it is never placed among the children of its
parent, it is simply discarded."  This does not seem to be a statement
about data models, but rather a constraint on languages that create
them.  Is that the intention?  If so, it should be phrased as a MUST
or SHOULD.  Or is it intended to modify the description of text node
construction in 6.7.3?

6.7.3
[element content white space] should be [element content whitespace].

6.7.3, content
Why is the construction of text nodes consisting only of whitespace in
element content not described in terms on the [element content
whitespace] property?  Also note that only validating parsers are
required to return this information.  And it should be mentioned that
this is a big change from XPath 1.0, which did not remove element
content whitespace.

6.7.4
The statement that empty text nodes are discarded should not appear only
under contruction from a PSVI, since it applies equally to construction
from an infoset.

6.7.5, [element content whitespace]
"Unknown" should be in italics.

Appendix A
The [unparsed entities] property of Document infoitems is optionally used.
The [element content whitespace] property of Character infoitems is 
optionally used.
The [prefix] property of Element and Attribute infoitems is (optionally?)
required.
The [prefix] property of Namespace infoitems should not be optional.

Appendix D, anonymous type name
The term "static context" is not defined (presumably it is defined
in one of the other specs).

Appendix D, expanded-QName
(typo) "empyt" should be "empty".

Appendix D, instance of the data model
"Every instance of the data model is a sequence."  This is not really
a definition of "instance of the data model".  It makes more sense in
the Terminology section (2.1) where it is immediately followed by a
definition of "sequence", but here it is quite mysterious.

[I have not checked the details of the example in Appendix E.  I hope
it was automatically generated.]

Appendix F.2, item 7
(typo) "depedent" should be "dependent".

Appendix G.11, Namespace Nodes
See comment on 6.4.2, dm:node-name.

Appendices H-J
Duplicating this information from section 6 seems unnecessary.  (Unlike
appendix G, which usefully brings together the information for each accessor.)
Received on Thursday, 7 April 2005 13:15:25 UTC