- From: Stephen R. Savitzky <steve@rsv.ricoh.com>
- Date: 03 Dec 1998 10:45:14 -0800
- To: Jonathan Robie <jonathan@texcel.no>
- Cc: Mike Champion <mcc@arbortext.com>, f.cameron@ulst.ac.uk, www-dom@w3.org
Jonathan Robie <jonathan@texcel.no> writes: > At 02:29 PM 12/2/98 -0800, Stephen R. Savitzky wrote: > > >Examples of things that don't round-trip include choice of quotes for > >attributes, named vs. numeric character entities, omitted start-tags and > >end-tags in HTML documents, presence of line breaks before and inside of > >tags, and whether an explicit end tag or "/>" was used on an empty tag in > >XML. > > Each of these is an example where there is more than one way to encode the > same document structure. HTML is slated to become XML conformant in the > future, so omitted start-tags and end-tags will become obsolete, but some > of these others might really matter in an authoring application that > persists the output. ... or in an application that _generates_ documents from something else, e.g. a database. Or, in my case, a proxy that reads documents, does some processing, then passes the result along. It's not clear to me that HTML is _ever_ going to be XML conformant, unless you think that nobody is ever going to construct HTML files in a text editor in the future. I can see things going the other way, with the web being opened up to full-bore SGML documents. > Currently, of course, we can't persist documents, so it doesn't really > cause any problems. Java object serialization does a pretty good job. For that matter, simply converting a document to external form, storing it in a file, and parsing it yields a very robust form of persistence (e.g. robust across changes in internal representation, which Java serialization is not). So the inability to reconstruct documents is already a problem for me, and my DOM implementation contains the necessary extensions. > >DOM level 1 loses information -- it is not possible to reconstruct the > >original document from the "equivalent" DOM tree. This is one of the > >most serious problems with it, by the way. Another is the inability to > >represent generic SGML documents. They're related. > > The DOM isn't *supposed* to represent generic SGML documents, it's not in > our charter. I don't think we should spend our time figuring out how to do > record ends and other obscure things few people are using. I'd settle for the ability to handle omitted start and end tags, character entities, and full (not XML-restricted) DTD's. What I'd prefer, though, is a defined way of extending a DOM document to include meta-information, and an approved method for adding new node types. > >I ran into one of these recently, when I was contemplating using @ for > >@ in documents containing my e-mail address to foil spammers (at least until > >they start using DOM-based address-suckers). > > Now waitaminit, they can't use our DOM to do anything so despicable as > writing address-suckers... What was that about your charter? -- Stephen R. Savitzky Chief Software Scientist, Ricoh Silicon Valley, Inc., <steve@rsv.ricoh.com> California Research Center voice: 650.496.5710 fax: 650.854.8740 URL: http://rsv.ricoh.com/~steve/ home: <steve@starport.com> URL: http://www.starport.com/people/steve/
Received on Thursday, 3 December 1998 13:44:28 UTC