Re: Using the DOM with Java

At 02:29 PM 12/2/98 -0800, Stephen R. Savitzky wrote:

>Examples of things that don't round-trip include choice of quotes for
>attributes, named vs. numeric character entities, omitted start-tags and
>end-tags in HTML documents, presence of line breaks before and inside of
>tags, and whether an explicit end tag or "/>" was used on an empty tag in
>XML.  

Each of these is an example where there is more than one way to encode the
same document structure. HTML is slated to become XML conformant in the
future, so omitted start-tags and end-tags will become obsolete, but some
of these others might really matter in an authoring application that
persists the output.

Currently, of course, we can't persist documents, so it doesn't really
cause any problems. These things are primarily important for round tripping
- the main purpose of the DOM is to represent document structure, not the
keystrokes originally used to encode them. But as soon as we can persist
documents, round tripping gets pretty important, IMHO.

>And of course, the DTD and any other declarations embedded in the
>document don't get into the tree, either.

Right. We are all very aware of the need to represent the DTD.

>DOM level 1 loses information -- it is not possible to reconstruct the
>original document from the "equivalent" DOM tree.  This is one of the
>most serious problems with it, by the way.  Another is the inability to
>represent generic SGML documents.  They're related.

The DOM isn't *supposed* to represent generic SGML documents, it's not in
our charter. I don't think we should spend our time figuring out how to do
record ends and other obscure things few people are using.

>I ran into one of these recently, when I was contemplating using @ for 
>@ in documents containing my e-mail address to foil spammers (at least until
>they start using DOM-based address-suckers).

Now waitaminit, they can't use our DOM to do anything so despicable as
writing address-suckers...

Jonathan
 
jonathan@texcel.no
Texcel Research
http://www.texcel.no

Received on Wednesday, 2 December 1998 22:28:33 UTC