- From: Arjun Ray <aray@q2.net>
- Date: Sun, 5 Dec 1999 07:02:15 -0500 (EST)
- To: www-html@w3.org
On Fri, 3 Dec 1999, Dave Raggett wrote: > On Fri, 3 Dec 1999, Jelks Cabaniss wrote: > > Arjun Ray wrote: > > > > > The tragedy is that a formal spec for Tag Soup was never written. > > > > Especially since it's going to be around for a long, long time. Indeed. The trouble seems to be that the truth underlying this prognosis is an ugly one, and the ugliness prompts a reluctance to acknowledge it. > > Even if UAs next year onward reject any and all malformed documents > > declaring themselves with XHTML DOCTYPEs and namespaces, if they > > can't also grok Tag Soup, who in the General Public would want to > > use them? Yep. Gresham's Law. > > Content is what is important, and the GP cares less if the content > > is encrypted in Tag Soup. > > > > So UAs with real XML/SGML parsers will still need a TAGSOUP.DLL > > for the foreseable future ... > > My work on HTML Tidy was motivated by an attempt to deal with this > by providing an Open Source solution for converting Tag Soup > documents into something easier to process. Respectfully, this misses the basic point. Tidy is a wonderful program, and no doubt useful - to those who care. The issue is why many if not most will not care, and the reason is that Tag Soup is *not* inherently difficult to process - the program has merely to reflect the thought process underlying the use. A Tag Soup renderer - flowing text according to a set of global flags modified in "stream" fashion - is actually *easy* to write. See a tag, do something; no tag, no "action". That's how and why </P> is supposed to make a difference, as witness this non-justification of a non-problem: http://lists.w3.org/Archives/Public/www-style/1998May/0101.html In fact, this is precisely what Mosaic did, and precisely why Mosaic seemed so "robust". It was too *stupid* to get into trouble. Midas and Viola would regularly crash on stuff that Mosaic "handled" with aplomb, because they tried to do intelligent, often context sensitive things with markup. (Like collapsible lists - but what use was that when UL "meant" indent and LI "meant" plunk-a-bullet?) Mosaic's "innovation" was to *reduce* potentially powerful markup to a small set of lo-tech, readily apprehensible and "predictable" formatting primitives - skip a line, indent/cancel, bold/ital/cancel, font size change/cancel, etc. That was why Andreessen and Bina tossed the libWWW design (which called for a separate stylesheet driven rendering widget) in favour of their libhtmlw "HTML widget" - a renderer that took *tags* directly as "commands". As long as each tag in isolation expanded macro-like to zero or more of the (relatively orthogonal) behavioral toggles "supported" by the widget, it didn't matter what dog's breakfast of a mishmash you fed it, it would simply and stolidly "do what it was told". That's the genesis of "HTML Of The Month" nominations like this <p><br><br><br><p><p><br><br><p> But the point to appreciate is the *thought process* of authors doing stuff like this - what they *expected* and were gratified to see "work": the concept is "skip-a-line", so it doesn't matter what one calls it, if it takes voodoo incantations like <p> and <br>, so be it. The Mosaic paradigm was to support the thought process faithfully. It was also no surprise that "a lot of tags seem to do the same thing" - UL OL and DD all got you nice indents - and I'll hazard the guess that the reason why Netscape invented <FONT> and <CENTER> but not <INDENT> is that one of their Bright Sparks must have said "They have more than one way to do that already: This is not Rocket Science!" And, indeed, a few seasons later, this would be why Netscape Composer *generated* <DD> in response to a request for an indent. Dismmissing Tag Soup with a snort does not diminish the fact that it can be internally consistent. The syntax is entirely secondary. In fact, that's why Javascript had document.write() from the beginning - to write a stream of commands - tags! - back into the renderer. More the pity that the tags have to be between '<' and '>' - someone might think that SGML was involved. > One problem in writing a formal spec for tag soup is that there > are significant differences between Navigator and IE. Microsoft's > reverse engineering team got it close, but not close enough. True. They also made the mistake of trying to rationalize what at root was just beer-and-pizza coding (that is, they found more method to the madness than there really was.) However, the basic features of the spec are not difficult, to set down. http://lists.w3.org/Archives/Public/www-html/1999Oct/0053.html The "meta-spec" would appeal to a stream-based processing model, where aspects of a global processing state (margins, font size, color, etc.) are impacted by commands embedded in flowable text. These commands are syntactically distinguished from data by the marks '<' and '>'. (Didn't TimBL once write a "rant" about this, chiding the "markup person"?) The actual Tag Soup spec could then list expected behaviors. For historical reasons, it becomes necessary to contend with the fact that a lot of these commands are utterly inscrutable - UL, /DL, DT, LI, whatnot - when it might have been simpler just to have <FONT>, <SKIP>, <INDENT> and so on, but them's the breaks. Differences between the Tweedles would, admittedly, pose a "political" problem. For instance, <FONT><TABLE>...</TABLE></FONT>. Arguably, IE's treatment (to "honor" the font-spec - or is that Navigator) is the more "logical" one, in terms of how the Tag Soup *mindset* would expect things to work. > In any event, few people have expressed a common need for such a spec. On the contrary, a spec that *meaningfully* captures the behavior of the popular wowsers is precisely what plenty of people are calling for. The SGML formalism simply does not fit that bill. > Discussions over time in the various HTML working groups have tended > to be prescriptive, focussing on how people should write rather that > what they do write in practice. Browser implementers are required to > take a more pragmatic view though, and the existing specs are just > the tip of the iceberg. The existing specs are supremely irrelevant. We knew that a long time ago. http://www.nyct.net/~aray/htmlwg/stds.html http://www.nyct.net/~aray/htmlwg/rcs.html Arjun
Received on Sunday, 5 December 1999 06:40:14 UTC