Re: How should text be normalized for tests?

Joseph Kesselman writes:
 > That doesn't comply with the expectations which the DOM places upon XML
 > processors. See the description of Text nodes:
...
 > Your choices would seem to be to either (a) accept that Expat is going to
 > fail compliance tests which are based on the above assertion (which
 > requires that your test results explain why this isn't a "first made
 > available" situation), or (b) write your test driver so it calls normalize
 > () before running the tests (ditto) or (c) fix Expat's DOM builder.

I must have missed the first message about this.  As Expat's
(somewhat) active maintainer, I feel compelled to chime in.

The basic Expat parser does not provide a DOM implementation at all,
so I think identifying problems with it is difficult.  Expat is much
closer to a SAX parser than anything else and should be thought of in
those terms.

There are several DOM implementations which have been built on top of
Expat for various languages; I've built one in Python, and have built
a builder for another (also in Python).  Both of these have provided
normallized text nodes.  The only reason I can think of which would
prevent this would be if the text nodes are exposed to client code
before being finished; in that case it makes more sense to start a new
node to avoid surprising the client code by modifying an
already-exposed node.  For synchronous-loading DOM implementations
(such as mine), this should not be an issue.

The Expat-based DOM-builders for Python (I know of three) are all open
source, so anyone writing an Expat-based DOM builder is free to look
at them for techniques if they like.  I'd also be willing to
correspond with anyone to help out if I can, though my availability to
write code would be minimal at best.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

Received on Monday, 25 March 2002 11:07:44 UTC