Re: XHTML

Daniel Hiester wrote:
> 
> I've been interested in the development of XML / XHTML for quite some time,
> and this thread raises one question for me... may or may not be on-topic
> (sorry if it's off)

I don't think anything to do with HTML on a list entitled 'www-html' is 
off-topic. But I hope you can pardon my rather wandering reply.

> Does this represent, in the _opinion_ of people in the working group, 
> "the death of HTML?"

[While I think many on the HTML WG might agree with the what I'll say 
here, I can only speak for myself. Actually, I'll be speaking on this 
topic at Markup Technologies '99 in Philadelpha, so I suppose I have a
few _opinions_...]

If HTML is to die, it'll die of its own accord, not through any plan of
action or negligence on the part of the HTML working group.

Recently on a mailing list it was pointed out that internal subsets
aren't "allowed" in HTML, hence the normative reference to SGML should be
removed from the specification. I can only laugh.

The HTML 4.0 Specification is formally designated as an application of 
SGML (and apart from arguing about the internal subsets issue, its markup 
is syntactically SGML), but I don't need to tell anyone that in usage, 
HTML is not only far from SGML, it's far from being simply a markup language.

It incorporates all manner of kludgey extensions, several different kinds
of incompatible scripting (often encapsulated in SGML comments which are
stripped in many true SGML applications), commonly-used features whose
application or conformance boundaries have never been formalized in any
specification (even though the markup to 'support' it may have, eg., frames),
variant, non-compliant browser implementations, a yet-to-ever-be-implemented
stylesheet syntax that keeps growing like mold, inline styling and scripting
that dynamically alters the document depending on user agent ability/inability,
etc. IOW, HTML's burgeoning "application conventions" and supplemental 
"features" are choking it to death. Try turning off Javascript and images 
and browsing the web for awhile. Especially sites that rely on very proprietary
new "XML" features (that aren't XML at all).

And while HTML 4.0 is in theory Unicode-based, internationalized and WAI
compliant, it's in practice (both in applications and in documents) almost
none of those things. And so it works some people, not for others. And as 
the Web moves increasingly into the world of small device browsers, that
fragmentation will only increase. But this is old news to many.

"HTML" documents in theory should be viewable on any browser that 
implements the specification, but unfortunately HTML 4.0's spec allows
for such wide variance and requires support for CSS (itself an impossibility)
that I hardly blame MS and NS for not having compliant browsers. The dream
of document interoperability died a long time ago, probably somewhere 
between HTML 2.0 and 3.2. What we have now is Frankenstein's markup.
 
> What I mean is, does the effort of the working group to create / promote
> XHTML represent an attempt to bring to an end a winding, twisting history of
> the SGML-based HTML language, and start a brand new era? Is SGML-based HTML
> too limited to continue to grow in a rate comparable with the growth of
> demand being placed upon it?

The short answers: Yes. No.

Because XML must be *at the very least* well-formed (simply to pass thru
an XML processor), we hope that this level of compliance will set a higher
threshold for markup quality that will enable better baseline processing.
For example, well-formed XML can be fairly faithfully transformed using an
XSLT stylesheet into other forms, like altered for use with small devices.
And because the biggest problem with current HTML parsers is the enormous
amount of error-handling code (most HTML documents are one big error, IMO),
the move to XML will decrease the parser size dramatically. In XML there is
no error handling code: it spits out the document. And before anyone throws
a tantrum, remember that this is the error behaviour of most word processors 
and other applications on encountering an error in a binary data file.

One of the twisting forces of "nature" that has caused so many problems
for HTML is that every company wanting to "innovate" (not just Microsoft
and Netscape) has pushed all manner of ideas into HTML, so it long ago
lost it's uber-ML appeal. I have been an advocate for modularization
of HTML for many years (since probably about early 1996, see [MHTML]), 
but mostly in order to *subset* it, while I see around me the desire
to add even more new features. What we need is more simplicity, not more
features.

The modularization of HTML follows the mold of some already-modularized
SGML languages like DocBook and TEI. There was nothing particularly limiting
in SGML (for XML is just a subset of SGML and is therefore actually less
expressive), but with the mindset of fixed markup languages. 

Some believe that well-formedness alone or namespaces will solve this 
problem, whereas I believe (as I'm sure some are tired of hearing) that 
they're completely unsuitable for the task of creating hybrid-doctype 
documents, striking the balance between interoperability and openness
too far to the latter. I agree with Frank that the combination of WF
markup and an XSL stylesheet will provide interoperable presentation.

But beyond this, due to politics, inertia and entrenchment, I'm more and
more thinking the W3C incapable of remedying this problem. Their solutions 
are getting ever more complicated, not less so, to the point where XML 
Schemas are so complicated that creating a moderately complicated one 
(say even as complicated as HTML, which isn't complicated) is an almost 
impossible task for non-experts. People made this statement about DTDs; 
wait until they get a load of schemas. Perhaps we need to be travelling 
down a different road entirely.

As before there are those wishing to differentiate themselves in the
marketplace by creating new proprietary functionality; this time
they're actively involved in getting a stamp of approval from the W3C
by participating in W3C working groups. And of course those who ignore
the W3C when they don't get their way. I've seen companies stamp their
feet like a three year old.

Call me an idealist (for I surely am one), but I hope that out of this
confusion arises the idea that a web of documents that all people (in
any country, in any economic class, with varying computing and personal
abilities, on any device) is a goal that we all should strive towards.
That the marketplace will lose to the "community" who demand the ability
to read documents. That people themselves will stop trying to write
clever web pages and concentrate instead on ones that everyone can read.

The ability to create many varieties of interoperable markup languages
based on a common framework (XML and its family of specs, XLink, XSL,
etc.) relies on people abandoning proprietary markup (and in this I 
include a wide array of non-XML Web "features" such as CSS, JavaScript,
the current HTML linking syntax, etc.) and begin using truly 
interoperable markup. A new baseline for interoperability, a new era 
based on XML, XLink and XSL.

> I guess this is where philosophy meets technicality... sorry if I'm
> off-topic... I'm just very, very curious... I'm pretty sure that I'd agree
> with W3C officials, no matter which stance they take... I'm just all too
> intrested in really knowing what their stance is.

Well, if we're going to venture into philosophy, let me do so as well. I
find your statement that you'd "agree with W3C officials no matter what
stance they take" very curious. Why? They're by no means gods, nor even
have demonstrably more expertise than the membership of the W3C or others
in industry who've been working with markup since the 1970's. I realize
the propensity of people to look for heroes, and some are often quite 
willing to bask in that limelight. 

I was rather perturbed while listening to a recent public radio interview
with a "W3C official" by his complete negligence in mentioning the role the
IETF and NSCA played in the development of the Web. You'd almost think the
Internet didn't exist prior to the Web. No mention of gopher, of course.
But I was not surprised.

One of the problems I see with the W3C is the same as often occurs when
any new technology arises: people get the feeling it was invented by one
or two people, when in reality it's often an entire scientific community
(Darwin, Edison and the Wright brothers come to mind). Innovation rests 
on the shoulders of what comes before, and the Web is no different. There
have been many hundreds of people involved in the evolution of the Web,
and many have been active participants in producing specifications,
developing applications and trying out new related technologies.

As we're seeing with the enormous surge in popularity of Linux, it's not
the opinions of a few people that matter, but the community at large.
Rather than concentrate on the opinions of others (which are often wrong,
myself included), think what you'd like to contribute to the community.

Murray

[MHTML] http://www.altheim.com/specs/mehitabel/
...........................................................................
Murray Altheim, SGML Grease Monkey         <mailto:altheim&#64;eng.sun.com>
Member of Technical Staff, Tools Development & Support
Sun Microsystems, 901 San Antonio Rd., UMPK17-102, Palo Alto, CA 94303-4900

   the honey bee is sad and cross and wicked as a weasel
   and when she perches on you boss she leaves a little measle -- archy

Received on Tuesday, 23 November 1999 18:41:27 UTC