Comments on the 2000-07-26 WD

Properly distinguishing between lexical and logical information
===============================================================

Section 1 says "As long as the information in the information set is
made available to XML applications in one way or another, the
requirements of this document are satisfied." Section 3, however, says
"An XML processor conforms to the XML Information Set if it documents
the information items and properties that it provides."

This seems inconsistent to the naive reader and what it implies should
probably be explicitly explained. Personally, I would rather see the
sentence "Conformance to the core is not a requirement for conformance
to the Infoset." from 3.1 have the 'not' removed. In fact, I think
that there would be definite value in removing all the non-core parts
of the infoset, since I am unsure as to what purpose they serve.

To me, the value of the infoset is precisely that it draws a firm line
between necessary and unnecessary information about XML documents, and
the non-core part can only muddy this distinction.


'Parent' properties
===================

In my opinion these should be removed, since they serve absolutely no
purpose at all that I can see.  Section 1 says: "This specification
presents the information set as a modified tree for the sake of
clarity and simplicity, but there is no requirement that the XML
Information Set be made available through a tree structure; other
types of interfaces, including (but not limited to) event-based and
query-based interfaces are also capable of providing information
conforming to the XML Information Set."

To me this implies that if the children of nodes are made available by
an API, but parent properties are not explicitly represented, that API
still satisifies the infoset requirements, since parent information is
implicitly available (just as in event-based interfaces). If this is
the case then the parent properties serve no purpose at all and should
be removed.

This issue is also 'cosmetical': to me these properties feel far too
much like an API rather than an abstract data model and ought to be
removed even if the above interpretation is not correct, since IMHO it
is no concern of the infoset whether APIs (or other systems)
explicitly provide parent information or not.

The same concern applies to the 'owner element' property of attributes.


Minor editorial issues
======================

The term 'document order' is used, but not defined.

document.standalone can have the values 'yes', 'no' and 'not present',
which IMHO fits badly with document.base URI, which can be null. It
seems better to me to allow standalone to also be null.

Defining and coordinating the use of the terms 'null' and 'not
present' would probably be useful.

The definition of attribute.children mentions 'element content', which
it probably should not. I assume this is a typo.

attribute.specified should probably be defined more precisely if it is
left in.

attribute.attribute type implies that for enumerated attribute types
one should not be told what the enumeration consists of. IMHO that is
inconsistent and if this property is retained at all it should include
that information.

It should probably be specified that the public identifier properties
should hold strings normalized according to the rules of section 4.2.2
of the XML Recommendation. Ditto for the system identifier properties.

The definition of the namespace.children property seems rather strange
and should probably be reformulated. (It also mentions 'element content',
probably because it was copied from the attribute.children definition.)

Section 3 is immediately followed by a section 4.1, which obviously
should be 3.1.


Re Appendix B, Other things that are not in the infoset:
========================================================

 - distinction between literal text and character references
   - distinction between hex and decimal character references
   - distinction between uppercase and lowercase hex character references

 - whitespace between target and data of PI

 - original character encoding of document

 - prefixes used in namespace names

...and much more, but probably this list is not intended to be
complete, so I'll skip the rest.

--Lars M.

Received on Monday, 13 November 2000 07:49:33 UTC