Relationship or Canonical XML to the InfoSet

I am confused as to the relationship between Canonical XML and the InfoSet.
There seems a need for a stronger policy statement. A simplistic approach
would say "if it's in the core Infoset, it's present in Canonical XML, if it
isn't, it isn't".

Instead we seem to have a pick-and-choose approach. For example, namespace
prefixes are in the core infoset but not in canonical XML. Unparsed entities
are also in the core InfoSet but seem to be omitted from Canonical XML. This
seems merely to perpetuate the tradition of each standard in the XML family
making its own decisions about which aspects of an XML document are
significant and which are not; the effect is to increase confusion about the
true semantics of XML rather than to reduce it.

My own preferred approach would be to make the InfoSet and Canonical XML a
single document, with the latter describing a concrete algorithm for
extracting the information items and properties described in the former.

That still leaves the problem that the model differs from the one used in
XSLT and XPath. It seems very unfortunate that an XSLT processor that always
generates Canonical XML, or one that canonicalizes its input before
processing, will not conform to the XSLT standard. Because, for example, its
handling of comments will not meet the XSLT specification. This also means
that Canonical XML is not useful for conformance testing of XSLT processors.
 
The relationship with the XML Namespaces standard also needs to be spelt out
more clearly. It's not clear whether the Canonical XML and Infoset standards
are intended to apply to any XML document, or only to a document that also
conforms to XML Namespaces.

A more detailed point (for Canonical XML): in 5.6 the condition "When the
element type and the attribute names do not have namespaces" needs to be
spelt out more pedantically: one could argue that every element type has a
namespace.


Mike Kay

Received on Monday, 24 January 2000 07:39:53 UTC