Re: How to put XML on the Web

In message <> Tim Bray writes:
> I have been thinking quite a lot lately as to how XML might best be put to use
> on the Web.  The following has been shaped quite heavily by discussions on
> this subject with Lauren Wood and Jean Paoli.

This is an exciting topic -
When appropriate, I'm sure xml-dev will be interested in this.

I'll add personal comments, but also emphasise that XML has the potential
to do a lot of NEW things and so any suggestions here should be extensible.

> Plan A: HTML++
> We'd like to use XML to "extend HTML"; i.e., as "HTML++".  Let's call this
> Plan A.  It sounds awfully easy... just teach your local web browser to accept
> weird non-HTML tags, load them into the Dynamic HTML data structures so that

There are two aspects to 'non-HTML tags', syntax and GI.  I suspect that 
teaching a local browser to understand <BR/> is much more challenging 
than <FOO XML-LINK=...>

> Another Plan A problem is: if you want to "extend HTML", you'd probably,
> if you're a sensible SGML kind of person, like to do this in a valid way;
> by taking the HTML DTD and dropping in some of your own elements.  
> Unfortunately, HTML as described by, for example, the 3.2 DTD, is not and
> never will be well-formed.  Furthermore, these DTDs have become sufficiently
> overgrown that they are only at the margin human-comprehensible; not a 
> really good platform to start extending from.

I agree with this.  In good extensible manner I incorporated HTML2.0 (some
time ago) into CML so that I didn't have to write another hypertext language.
It seemed that this was the 'right way to do it'.  However the HTML DTD has
given a lot of sweat, (mainly, but not entirely, with XML and the parameter
entities).  So I'm seriously considering taking a subset of the HTML tags
and creating my own DTD.  

In general you have to write code to process everything in a DTD, whether it
be through stylesheets or Java classes.  The more there is, the more work.

If we are 'extending local browsers', where do people start from.  I downloaded
Amaya (W3C) yesterday and it appears to follow the SGML way by (?) hardcoding
HTML3.2 in and validating in the SGML sense.  Is there synergy here?  Is 
Amaya extensible to other DTDs?  If so it might be a very useful starting
point, although I know nothing about where it is in the W3C's plans.

> Plan B: HTML--, +X
> Plan B says, let's allow the use of XML to extend HTML, but drop the
> requirement that browsers deal with all kinds of crufty horrible bad
> practice.  So you can have either extended HTML, or bad HTML, but not both.
> We could even think about writing a much smaller human-tractable HTML DTD as a
> basis for doing this.

*Anything* that forces/coerces/persuades people to write 'decent' HTML is 
very important.  There is now now reason why most HTML documents cannot be
(nearly) well-formed (i.e. leaving out <BR/>, etc.)
> Plan B, though, is still hard.  The HTML display semantics aren't going to go
> away, and they are plenty difficult enough without asking for foreign-tag
> wrangling to go on in parallel.  Furthermore, are the kind of people who are
> going to be providing XML going to want to use all the arcana of HTML display
> mechanics?  Still, Plan B may be viable.

Agreed.  I have had to write a grotty HTML renderer for JUMBO although it is
neither fun nor easy.  Unfortunately you also have to assume that people will
add more tags as the major browser m'facturers create these...

> Plan C: XML, +H
> The really good thing about HTML is neither its syntax nor its baroque
> ad-hoc display semantics; rather, the value is in the incredibly useful,
> proven-in-action hypertext, multimedia, and interaction semantics of <A>, 
> <IMG>, <FORM> and <TABLE>.  For display, I think most XML-capable 
> information providers would rather use a stylesheet anyhow.  
Or a method-based approach (e.g. through Java classes).  But the principle 
is the same - supply a IMG.class with the *browser* and let the browser
recognise an <IMG> element which it then subclasses to IMGElement.java
(Essentially this is what I do at present, though I'm struggling withthe
Java bits of image manipulation from non-gifs).

Note that we *must* re-use what the rest of the community does - XML developers
should not be in the business of recreating general browser functions if
they can avoid it.  

> Thus plan C.  Instead of a full-feature Web browser, have a miniature XML
> browser that has *no display semantics* other than those from a
> stylesheet... however, it does recognize (either by hardwired GI or
> architecture-like attribute) a small number of semantically loaded useful
> elements from HTML. This could be either a standalone thing, or something
> embedded in your local Netsplorer, that makes use of the browser's hypertext
> semantics and stylesheeting.
> Plan C looks good to me.

This looks both technically and politically achievable, and is the way that I
have been tending (e.g. 'borrowing' <A> and <IMG> semantics, and junking 
<H1>, <UL> (because it can't contain much useful), etc.)

I would strongly support the hardwired tagset.  

There are also Plans D, E, 9, etc...  that don't include browsing as the main
activity.  For example HTML may be carried as XML and retransformed on
arrival for browsing, whilst the XML is also used for calculation, monetary
transactions, etc.  The main thing is to share *information components* where

A lot of this now comes down to software engineering at the browser end.
For example, how can one transform an XML document so that the 
application-specific stuff comes up on a specialist tool, whilst the generic
stuff comes up under Netsplorer *and there are robust 2-way hyperlinks
between the two*?  When that happens, the millenium will have arrived for me.


Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences