Re: Using the DOM with Java

Mike Champion <mcc@arbortext.com> writes:

> Fair enough ... Another thought I had is that it *should* be possible to
> write a DOM application in Java that serializes an XML or HTML document (or
> subtree) to/from a database using JDBC.  (I forget -- are there some
> limitations on a DOM application's ability to serialize an arbitrary
> document?  Perhaps some of the XML entity/notation stuff won't round-trip,
> my memory is fuzzy ... But the DOM Level 1 *should* be powerful enough to
> serialize an HTML or simple XML document, right?).  Has anyone seen such a
> thing, or tried to do it?

DOM level 1 loses information -- it is not possible to reconstruct the
original document from the "equivalent" DOM tree.  This is one of the
most serious problems with it, by the way.  Another is the inability to
represent generic SGML documents.  They're related.

I am hoping that these two deficiencies will be addressed in Level 2.

Examples of things that don't round-trip include choice of quotes for
attributes, named vs. numeric character entities, omitted start-tags and
end-tags in HTML documents, presence of line breaks before and inside of
tags, and whether an explicit end tag or "/>" was used on an empty tag in
XML.  And of course, the DTD and any other declarations embedded in the
document don't get into the tree, either.

I ran into one of these recently, when I was contemplating using &#64; for 
@ in documents containing my e-mail address to foil spammers (at least until
they start using DOM-based address-suckers).

-- 
 Stephen R. Savitzky   Chief Software Scientist, Ricoh Silicon Valley, Inc., 
<steve@rsv.ricoh.com>                            California Research Center
 voice: 650.496.5710   fax: 650.854.8740    URL: http://rsv.ricoh.com/~steve/
  home: <steve@starport.com> URL: http://www.starport.com/people/steve/

Received on Wednesday, 2 December 1998 17:28:11 UTC