Re: Tag Soup (was: FW: XHTML) from Frank Boumphrey on 1999-12-05 (www-html@w3.org from December 1999)

From: Frank Boumphrey <bckman@ix.netcom.com>
Date: Sun, 5 Dec 1999 10:44:02 -0500
To: "Arjun Ray" <aray@q2.net>, <www-html@w3.org>
Message-ID: <003501bf3f37$913cdf20$04addccf@preferreduser>
<arjun>In fact, this is precisely what Mosaic did, and precisely why Mosaic
seemed so "robust".  It was too *stupid* to get into trouble.  </arjun>

And this is exactly why it's successor netscape got into trouble!! This
paradigm of NOT building a parse tree makes it incredibly difficult to
applydynamic style, reflow etc.

Netscape have had to rebuild from the ground up! (Now Mozilla looks like a
really good browser.)

frank


----- Original Message -----
From: Arjun Ray <aray@q2.net>
To: <www-html@w3.org>
Sent: Sunday, December 05, 1999 7:02 AM
Subject: RE: Tag Soup (was: FW: XHTML)


>
>
> On Fri, 3 Dec 1999, Dave Raggett wrote:
> > On Fri, 3 Dec 1999, Jelks Cabaniss wrote:
> > > Arjun Ray wrote:
> > >
> > > > The tragedy is that a formal spec for Tag Soup was never written.
> > >
> > > Especially since it's going to be around for a long, long time.
>
> Indeed.  The trouble seems to be that the truth underlying this prognosis
> is an ugly one, and the ugliness prompts a reluctance to acknowledge it.
>
> > > Even if UAs next year onward reject any and all malformed documents
> > > declaring themselves with XHTML DOCTYPEs and namespaces, if they
> > > can't also grok Tag Soup, who in the General Public would want to
> > > use them?
>
> Yep.  Gresham's Law.
>
> > > Content is what is important, and the GP cares less if the content
> > > is encrypted in Tag Soup.
> > >
> > > So UAs with real XML/SGML parsers will still need a TAGSOUP.DLL
> > > for the foreseable future ...
> >
> > My work on HTML Tidy was motivated by an attempt to deal with this
> > by providing an Open Source solution for converting Tag Soup
> > documents into something easier to process.
>
> Respectfully, this misses the basic point.  Tidy is a wonderful program,
> and no doubt useful - to those who care.  The issue is why many if not
> most will not care, and the reason is that Tag Soup is *not* inherently
> difficult to process - the program has merely to reflect the thought
> process underlying the use.
>
> A Tag Soup renderer - flowing text according to a set of global flags
> modified in "stream" fashion - is actually *easy* to write.  See a tag, do
> something; no tag, no "action".  That's how and why </P> is supposed to
> make a difference, as witness this non-justification of a non-problem:
>
>  http://lists.w3.org/Archives/Public/www-style/1998May/0101.html
>
> In fact, this is precisely what Mosaic did, and precisely why Mosaic
> seemed so "robust".  It was too *stupid* to get into trouble.  Midas and
> Viola would regularly crash on stuff that Mosaic "handled" with aplomb,
> because they tried to do intelligent, often context sensitive things with
> markup.  (Like collapsible lists - but what use was that when UL "meant"
> indent and LI "meant" plunk-a-bullet?)  Mosaic's "innovation" was to
> *reduce* potentially powerful markup to a small set of lo-tech, readily
> apprehensible and "predictable" formatting primitives - skip a line,
> indent/cancel, bold/ital/cancel, font size change/cancel, etc.  That was
> why Andreessen and Bina tossed the libWWW design (which called for a
> separate stylesheet driven rendering widget) in favour of their libhtmlw
> "HTML widget" - a renderer that took *tags* directly as "commands".  As
> long as each tag in isolation expanded macro-like to zero or more of the
> (relatively orthogonal) behavioral toggles "supported" by the widget, it
> didn't matter what dog's breakfast of a mishmash you fed it, it would
> simply and stolidly "do what it was told".  That's the genesis of "HTML Of
> The Month" nominations like this
>
>   <p><br><br><br><p><p><br><br><p>
>
> But the point to appreciate is the *thought process* of authors doing
> stuff like this - what they *expected* and were gratified to see "work":
> the concept is "skip-a-line", so it doesn't matter what one calls it, if
> it takes voodoo incantations like <p> and <br>, so be it.  The Mosaic
> paradigm was to support the thought process faithfully.
>
> It was also no surprise that "a lot of tags seem to do the same thing" -
> UL OL and DD all got you nice indents - and I'll hazard the guess that the
> reason why Netscape invented <FONT> and <CENTER> but not <INDENT> is that
> one of their Bright Sparks must have said "They have more than one way to
> do that already: This is not Rocket Science!"  And, indeed, a few seasons
> later, this would be why Netscape Composer *generated* <DD> in response to
> a request for an indent.
>
> Dismmissing Tag Soup with a snort does not diminish the fact that it can
> be internally consistent.  The syntax is entirely secondary.  In fact,
> that's why Javascript had document.write() from the beginning - to write a
> stream of commands - tags! - back into the renderer.  More the pity that
> the tags have to be between '<' and '>' - someone might think that SGML
> was involved.
>
> > One problem in writing a formal spec for tag soup is that there
> > are significant differences between Navigator and IE. Microsoft's
> > reverse engineering team got it close, but not close enough.
>
> True.  They also made the mistake of trying to rationalize what at root
> was just beer-and-pizza coding (that is, they found more method to the
> madness than there really was.)
>
> However, the basic features of the spec are not difficult, to set down.
>
>   http://lists.w3.org/Archives/Public/www-html/1999Oct/0053.html
>
> The "meta-spec" would appeal to a stream-based processing model, where
> aspects of a global processing state (margins, font size, color, etc.) are
> impacted by commands embedded in flowable text.  These commands are
> syntactically distinguished from data by the marks '<' and '>'.  (Didn't
> TimBL once write a "rant" about this, chiding the "markup person"?)  The
> actual Tag Soup spec could then list expected behaviors.  For historical
> reasons, it becomes necessary to contend with the fact that a lot of these
> commands are utterly inscrutable - UL, /DL, DT, LI, whatnot - when it
> might have been simpler just to have <FONT>, <SKIP>, <INDENT> and so on,
> but them's the breaks.
>
> Differences between the Tweedles would, admittedly, pose a "political"
> problem.  For instance, <FONT><TABLE>...</TABLE></FONT>.  Arguably, IE's
> treatment (to "honor" the font-spec - or is that Navigator) is the more
> "logical" one, in terms of how the Tag Soup *mindset* would expect things
> to work.
>
> > In any event, few people have expressed a common need for such a spec.
>
> On the contrary, a spec that *meaningfully* captures the behavior of the
> popular wowsers is precisely what plenty of people are calling for.  The
> SGML formalism simply does not fit that bill.
>
> > Discussions over time in the various HTML working groups have tended
> > to be prescriptive, focussing on how people should write rather that
> > what they do write in practice. Browser implementers are required to
> > take a more pragmatic view though, and the existing specs are just
> > the tip of the iceberg.
>
> The existing specs are supremely irrelevant.  We knew that a long time
> ago.
>
>   http://www.nyct.net/~aray/htmlwg/stds.html
>   http://www.nyct.net/~aray/htmlwg/rcs.html
>
>
> Arjun
>
>
>
Received on Sunday, 5 December 1999 10:31:28 UTC