XML infoset: please don't

Dear working group members:

XML Information Set terminology unfortunately seems to be having
adverse effects.  I just started rereading the XML Schema draft and
choked right away on the sentence:

  "An element information item is the component of an infoset which
   corresponds to an element."

No one should be forced to write like that! Another example,

   "XML Schema: an XML element information item which, along with its
   descendants, satisfies all the Constraints on Schemas in this
   specification"

This should have been: 

   "XML Schema: an element node which satisfies all the Constraints on
   Schemas."

These and many more examples are solid road blocks to the furthering
of XML; personally, they don't make my blood boil, but among the
public, some are enraged (see recent mailings to comp.text.xml).

I then tried to comprehend what an element information item is by
reading the XML Information Set note.  Nothing really deep it turns
out: it's a node in a tree representation of an XML document.  My
objection is that there are now two (at least) different tree models
of XML: DOM and XML information sets.  They are both justified, but I
believe they should be unified in what is (or should be) an obvious
way:

* DOM, being the finer model, is the starting point; the tree model is
  something any programmer can understand, and the most detailed one.

* DOM-I are trees gotten from trees in DOM by a mapping that convert
  CDATA to text and applies concatenate text nodes (by using
  normalize()) (and a couple of other tricks, more complicated it
  shouldn't be).

Canonical XML can now be explained by a simple transformation from
DOM-I. 

I would encourage that the XML Information Set be substantially
simplified.  Please put stakes through verbiage like "XML element
information item."  And, XML Information Set should be explainable in
one paragraph departing from DOM.  Then, make this paragraph a part of
DOM2 (along with canonical XML, perhaps).

Thanks,

/Nils

Received on Tuesday, 18 January 2000 11:44:54 UTC