XML, HTML, SGML, life, the universe, and everything

The rhetorical cannons in here have been going off in all directions, 
and there are a few points that I thought might benefit from some more
clarification.

1. What is XML For?

Jon has pointed out that by W3C statute, the primary goal of the ERB
and WG work is to enable the delivery of SGML over the Web.  Which is
true.  However, I and several others on both the ERB and the WG 
(including I think Charles Goldfarb) have an entirely un-hidden agenda,
namely to construct a lightweight, lean, mean, easy-to-learn on-ramp
for SGML.  Which involves no conflict (far as I can see) with the 
primary Web-oriented goal.  At a deep level, XML is *nothing new* - people 
have been building successful SGML apps for years and making a point of
leaving out minimization and other tricky stuff.  We're just writing down
a standard way to do this. 

2. Why the HTML Empty Tag Trick?

Primarily, to make it possible for a valid, normalized HTML document 
also to be a valid XML document.  Lee Quinn has raised, and nobody has
addressed here, the claim that this is a political move.  Not speaking
on behalf of the ERB, let me say that I think that Lee is correct; this
is a political move, and one which I approve of 100%.  It is not an
attempt to grandfather existing HTML.  It is an attempt to get 
lightweight SGML off the ground by making it possible to have a dialogue
with the huge, well-funded, energetic Web population.  Consider two
scenarios:

a. web-head: "Can XML process HTML documents?"
   xml-evangelist: "Yes, if they're clean"
   w-h: ...more questions...

b. w-h: "Can XML process HTML documents?"
   x-e: "No"
   w-h: "Huh?"
   x-e: "Well, XML can't handle the syntax of tags like IMG and BR and HR."
   ... no more questions; the conversation just ended.

Now, XML doesn't necessarily win in scenario a; but at least you're still
talking and you have a chance.

3. XML and existing SGML tools

Existing SGML tools can, if they're compliant, read XML today modulo
*only* overlapping enumerated attribute values and perhaps some 
mild inconsistencies as to which RE's are where.  Existing SGML tools
can't write XML until they can write empty tags as <e/>.  <hint>I suspect 
that the SGML vendors of the world could, if they started today, be 
demonstrating this at SGML'96</hint>.

Large quantities of SGML that use features such as &-based content
models, inclusions, exclusions, and so on, are *not* valid XML and cannot
be made into valid XML without some real work.  I can't imagine why you'd 
want to do that work - you already have good tools for dealing with them
as SGML, and if you change the end-tag syntax you have well-formed
[for well-formed read network-ready] XML.  Large quantities of SGML
are XML, modulo the syntax of end tags, which I refuse to see as a 
significant obstacle.

It is absolutely the unanimous intent of the ERB that existing SGML
tools be able to deal with XML, and there have been significant design
compromises in XML in order to make this happen.  Paul Grosso's assertion:

 I have sensed a lot of people trying to give
 lip service to the claim that XML is a subset of SGML--at least in
 spirit if not in fact--and it's refreshing to have you admit that
 this is not your intent.  It's not your intention that the base of 
 existing SGML authoring tools will handle XML.

is simply incorrect.  XML is a subset of SGML in spirit and in fact, and
XML is designed so that the adjustments required in existing SGML tools
are trivial.  Paul, any chance of getting you to back off on that a bit?

4. XML and HTML browsers

In order to work with XML, HTML browsers would have to do 
a small number of things:

 1. accept empty tags in the <e/> form
 2. apply stylesheet directives on non-HTML tags

It would also be nice but not essential if they would stop rendering
PI's on the screen. [grrrrr]

Will this happen?  I don't know.  I think it would be wonderful and a 
huge shot in the arm for the Web, and as a consequence a huge shot in 
the arm for SGML.  I think the chances are not that good, but also not
zero.  If anyone can think of a way to help make this happen, that would
be a major contribution.

I know for sure that all sorts of people cooking up cool web technologies
desperately want to extend HTML, and have damn little chance of getting
such extensions blessed by the current W3C process.  For them, XML is
a gift from above - if the browsers read it.

5. Economic Impacts of XML on the SGML community

Yes, if XML succeeds, this will place unprecedented stress on the
SGML community, as it scrambles to deal with a user base that has
increased by an order of magnitude, and overpowering market pressures
from the big players who have their own axes to grind.  It is achingly
obvious to me that this is an improvement over the present situation.

In particular, this should be a pot of gold for the authoring vendors -
none of the current half-cooked HTML-only tools are going to be exactly
easy to teach self-extending tag sets to.

And I'll tell you something, the end-users are going to love it.  I've
been pitching the XML idea to my consulting practice customers for the
last 2 or 3 months, and the idea of a standard, lightweight, easy way
to get into some descriptive markup and structure management is not 
a hard sell.  Most of these people know but SGML but are not currently
using it.  Mostly because they think it's too complicated.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Friday, 8 November 1996 15:17:45 UTC