DOM RFEs

Miscellany, including some based on recent XML-DEV discussion.

- Still need a module to hook a DOM up to a parser.  It should
  have a way to report errors, to request validation (by DTD),
  and to continue after reporting nonfatal errors (such as
  validity errors, and warnings).

- That module should offer control over what nodes show up
  in the resulting tree.

  I'll single out the use of CDATA versus normal text, the
  incorporation of "ignorable" whitespace, existence of
  Comment nodes, existence of EntityReference nodes, and the
  existence of DocumentType as things that I've found to be
  troublesome in various DOMs I've used.  Perhaps the flags
  used in the traversal APIs can be flags to the parser, so
  hookup API -- but that won't address ignorable whitespace.

- Currently there's a single node policy implemented by the
  Element.normalize() method ... the policy being that
  adjacent Text nodes get merged.  And there's a related
  policy that when that TBD parser hookup exists, the
  output tree in effect be pre-normalized.  (Which I see
  as a strange policy given the lack of parser hookup!)

  I'd like corresponding policies for geting rid of the node
  types mentioned above, without needing to walk the tree.
  (Precedent has been set that such "avoid a tree walk"
  API can be part of the DOM -- Element.normalize and
  Document.importNode are two examples.)

In the two points above, I'll emphasize EntityReference
nodes as troublesome ... even if one tries to use a tree
walker API to present a view without EntityReference nodes,
the illusion falls apart the moment one tries to modify a
node that's readonly.  (And suppose I want to just use the
standard API, anyway?)  Basically, I usually don't care about
the entity references, while I do care about being able to
modify _any_ DOM data structure once I've got it.

- For nodes exposing a SystemId, there are cases where it
  needs to be a fully resolved URI; and there are cases
  where it needs to be unresolved.

  I suggest that there thus be _two_ attributes, one for the
  resolved ID and one for the unresolved one.  (I know DOM
  implementations that take each approach.)

- There should be some way to expose the encoding used for
  a given entity ... and it should _not_ look like a PI node.

OK, that's all that comes to mind right now.

- Dave

Received on Tuesday, 16 November 1999 16:26:34 UTC