W3C home > Mailing lists > Public > www-html@w3.org > October 1999

Tag Soup vs Generalized Markup (was: I-D ACTION..)

From: Arjun Ray <aray@q2.net>
Date: Wed, 6 Oct 1999 09:39:57 -0400 (EDT)
To: www-html@w3.org
Message-ID: <Pine.LNX.3.95.991006090744.29982B-100000@mail.q2.net>

On Tue, 5 Oct 1999, Larry Masinter wrote:

[This is the third time I'm quoting the same passage, and only now am I
really addressing the meaning of the last clause!  Sorry about that.]
> I don't think this reduces the value of specifying what 'text/html'
> *should* be, although I agree it makes implementation hard.

The hard part is reconciliation of two different paradigms.  Tag soup
processors are not difficult to write.  Nor for that matter, are (S)GM(L)
processors.  (A *validating* SGML parser, OTOH, is indeed not a task for
mere mortals:))  Mixing the two, however, is a nightmare, because the
paradigms actually reflect a classic tradeoff - the simple contextless 
versus the sophisticated contextual, and thus the stolidly robust versus
the delicately powerful.  Just as the tagsoup processor is too *dumb* to
get into trouble (so it doesn't matter what kind of dog's breakfast you
feed it) the GM processor demands correspondingly greater coherence in
its input for its smarts.

Now, we'd all love to have those smarts working for us, except that Mosaic
and its spawn popularized the dumbs.  Moreover, there's no mystery to the
popularity.  The freedom to toss any random mishmash of tags into a wowser
set a very low bar; this has turned out to be extraordinarily empowering.
People are not going to give it up easily.  ("But it works in Netploder,
and that's good enough for me.") 

HTML is Humpty Dumpty toppled a long time ago.  There really isn't a cause
or even a need for a "should".  Because no one with two braincells to rub
together is ever going to bother to write a "conforming HTML processor" in
relation to the spec as it stands today.  The non-compliance is massive to

The I-D should point to a Tag Soup spec, and a separate SGML-based spec
should probably be written up as a W3C Note.  (Because there may be value
to modularized HTML as a family of architectures.)

Received on Wednesday, 6 October 1999 08:57:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:51 UTC