- From: Frank Boumphrey <bckman@ix.netcom.com>
- Date: Sun, 5 Dec 1999 10:44:02 -0500
- To: "Arjun Ray" <aray@q2.net>, <www-html@w3.org>
<arjun>In fact, this is precisely what Mosaic did, and precisely why Mosaic seemed so "robust". It was too *stupid* to get into trouble. </arjun> And this is exactly why it's successor netscape got into trouble!! This paradigm of NOT building a parse tree makes it incredibly difficult to applydynamic style, reflow etc. Netscape have had to rebuild from the ground up! (Now Mozilla looks like a really good browser.) frank ----- Original Message ----- From: Arjun Ray <aray@q2.net> To: <www-html@w3.org> Sent: Sunday, December 05, 1999 7:02 AM Subject: RE: Tag Soup (was: FW: XHTML) > > > On Fri, 3 Dec 1999, Dave Raggett wrote: > > On Fri, 3 Dec 1999, Jelks Cabaniss wrote: > > > Arjun Ray wrote: > > > > > > > The tragedy is that a formal spec for Tag Soup was never written. > > > > > > Especially since it's going to be around for a long, long time. > > Indeed. The trouble seems to be that the truth underlying this prognosis > is an ugly one, and the ugliness prompts a reluctance to acknowledge it. > > > > Even if UAs next year onward reject any and all malformed documents > > > declaring themselves with XHTML DOCTYPEs and namespaces, if they > > > can't also grok Tag Soup, who in the General Public would want to > > > use them? > > Yep. Gresham's Law. > > > > Content is what is important, and the GP cares less if the content > > > is encrypted in Tag Soup. > > > > > > So UAs with real XML/SGML parsers will still need a TAGSOUP.DLL > > > for the foreseable future ... > > > > My work on HTML Tidy was motivated by an attempt to deal with this > > by providing an Open Source solution for converting Tag Soup > > documents into something easier to process. > > Respectfully, this misses the basic point. Tidy is a wonderful program, > and no doubt useful - to those who care. The issue is why many if not > most will not care, and the reason is that Tag Soup is *not* inherently > difficult to process - the program has merely to reflect the thought > process underlying the use. > > A Tag Soup renderer - flowing text according to a set of global flags > modified in "stream" fashion - is actually *easy* to write. See a tag, do > something; no tag, no "action". That's how and why </P> is supposed to > make a difference, as witness this non-justification of a non-problem: > > http://lists.w3.org/Archives/Public/www-style/1998May/0101.html > > In fact, this is precisely what Mosaic did, and precisely why Mosaic > seemed so "robust". It was too *stupid* to get into trouble. Midas and > Viola would regularly crash on stuff that Mosaic "handled" with aplomb, > because they tried to do intelligent, often context sensitive things with > markup. (Like collapsible lists - but what use was that when UL "meant" > indent and LI "meant" plunk-a-bullet?) Mosaic's "innovation" was to > *reduce* potentially powerful markup to a small set of lo-tech, readily > apprehensible and "predictable" formatting primitives - skip a line, > indent/cancel, bold/ital/cancel, font size change/cancel, etc. That was > why Andreessen and Bina tossed the libWWW design (which called for a > separate stylesheet driven rendering widget) in favour of their libhtmlw > "HTML widget" - a renderer that took *tags* directly as "commands". As > long as each tag in isolation expanded macro-like to zero or more of the > (relatively orthogonal) behavioral toggles "supported" by the widget, it > didn't matter what dog's breakfast of a mishmash you fed it, it would > simply and stolidly "do what it was told". That's the genesis of "HTML Of > The Month" nominations like this > > <p><br><br><br><p><p><br><br><p> > > But the point to appreciate is the *thought process* of authors doing > stuff like this - what they *expected* and were gratified to see "work": > the concept is "skip-a-line", so it doesn't matter what one calls it, if > it takes voodoo incantations like <p> and <br>, so be it. The Mosaic > paradigm was to support the thought process faithfully. > > It was also no surprise that "a lot of tags seem to do the same thing" - > UL OL and DD all got you nice indents - and I'll hazard the guess that the > reason why Netscape invented <FONT> and <CENTER> but not <INDENT> is that > one of their Bright Sparks must have said "They have more than one way to > do that already: This is not Rocket Science!" And, indeed, a few seasons > later, this would be why Netscape Composer *generated* <DD> in response to > a request for an indent. > > Dismmissing Tag Soup with a snort does not diminish the fact that it can > be internally consistent. The syntax is entirely secondary. In fact, > that's why Javascript had document.write() from the beginning - to write a > stream of commands - tags! - back into the renderer. More the pity that > the tags have to be between '<' and '>' - someone might think that SGML > was involved. > > > One problem in writing a formal spec for tag soup is that there > > are significant differences between Navigator and IE. Microsoft's > > reverse engineering team got it close, but not close enough. > > True. They also made the mistake of trying to rationalize what at root > was just beer-and-pizza coding (that is, they found more method to the > madness than there really was.) > > However, the basic features of the spec are not difficult, to set down. > > http://lists.w3.org/Archives/Public/www-html/1999Oct/0053.html > > The "meta-spec" would appeal to a stream-based processing model, where > aspects of a global processing state (margins, font size, color, etc.) are > impacted by commands embedded in flowable text. These commands are > syntactically distinguished from data by the marks '<' and '>'. (Didn't > TimBL once write a "rant" about this, chiding the "markup person"?) The > actual Tag Soup spec could then list expected behaviors. For historical > reasons, it becomes necessary to contend with the fact that a lot of these > commands are utterly inscrutable - UL, /DL, DT, LI, whatnot - when it > might have been simpler just to have <FONT>, <SKIP>, <INDENT> and so on, > but them's the breaks. > > Differences between the Tweedles would, admittedly, pose a "political" > problem. For instance, <FONT><TABLE>...</TABLE></FONT>. Arguably, IE's > treatment (to "honor" the font-spec - or is that Navigator) is the more > "logical" one, in terms of how the Tag Soup *mindset* would expect things > to work. > > > In any event, few people have expressed a common need for such a spec. > > On the contrary, a spec that *meaningfully* captures the behavior of the > popular wowsers is precisely what plenty of people are calling for. The > SGML formalism simply does not fit that bill. > > > Discussions over time in the various HTML working groups have tended > > to be prescriptive, focussing on how people should write rather that > > what they do write in practice. Browser implementers are required to > > take a more pragmatic view though, and the existing specs are just > > the tip of the iceberg. > > The existing specs are supremely irrelevant. We knew that a long time > ago. > > http://www.nyct.net/~aray/htmlwg/stds.html > http://www.nyct.net/~aray/htmlwg/rcs.html > > > Arjun > > >
Received on Sunday, 5 December 1999 10:31:28 UTC