How to put XML on the Web
I have been thinking quite a lot lately as to how XML might best be put to use
on the Web. The following has been shaped quite heavily by discussions on
this subject with Lauren Wood and Jean Paoli.
Plan A: HTML++
We'd like to use XML to "extend HTML"; i.e., as "HTML++". Let's call this
Plan A. It sounds awfully easy... just teach your local web browser to accept
weird non-HTML tags, load them into the Dynamic HTML data structures so that
Java or *-script can use them; if a CSS directive is attached, try to put it
Plan A has problems. First is the fact that Web browsers, as a basic design
goal, must cope with the largely-undesigned interactions between all the
various HTML constructs, and with a perceived obligation to accept and display
any and all soi-disant HTML, no matter how bad. They are stretched to the max
(and in turn, stretch your poor little PC's memory and disk storage to the
max) in an attempt to do something reasonable with unreasonable combinations
of overlapping FORMs and TABLEs and ALIGN and CENTER. Adding XML processing
to this witches' brew, easy though it sounds, is not something they are going
to be eager to do.
Another Plan A problem is: if you want to "extend HTML", you'd probably,
if you're a sensible SGML kind of person, like to do this in a valid way;
by taking the HTML DTD and dropping in some of your own elements.
Unfortunately, HTML as described by, for example, the 3.2 DTD, is not and
never will be well-formed. Furthermore, these DTDs have become sufficiently
overgrown that they are only at the margin human-comprehensible; not a
really good platform to start extending from.
Plan B: HTML--, +X
Plan B says, let's allow the use of XML to extend HTML, but drop the
requirement that browsers deal with all kinds of crufty horrible bad
practice. So you can have either extended HTML, or bad HTML, but not both.
We could even think about writing a much smaller human-tractable HTML DTD as a
basis for doing this.
Plan B, though, is still hard. The HTML display semantics aren't going to go
away, and they are plenty difficult enough without asking for foreign-tag
wrangling to go on in parallel. Furthermore, are the kind of people who are
going to be providing XML going to want to use all the arcana of HTML display
mechanics? Still, Plan B may be viable.
Plan C: XML, +H
The really good thing about HTML is neither its syntax nor its baroque
ad-hoc display semantics; rather, the value is in the incredibly useful,
proven-in-action hypertext, multimedia, and interaction semantics of <A>,
<IMG>, <FORM> and <TABLE>. For display, I think most XML-capable
information providers would rather use a stylesheet anyhow.
Thus plan C. Instead of a full-feature Web browser, have a miniature XML
browser that has *no display semantics* other than those from a
stylesheet... however, it does recognize (either by hardwired GI or
architecture-like attribute) a small number of semantically loaded useful
elements from HTML. This could be either a standalone thing, or something
embedded in your local Netsplorer, that makes use of the browser's hypertext
semantics and stylesheeting.
Plan C looks good to me.