Re: SGML

I'll second David's conclusion and add some information for Surendra's
sake.

HTML 4.01, and every version of HTML for that matter, *is* an SGML
application [1].  If your browser includes support for particular SGML
applications other than HTML, that could be removed.  Though it's more
likely that you're seeing corrections for quirks. There are also some
SGML constructs that LOOK like quirks or typos to an average web
developer but that are perfectly valid constructs in the HTML
specifications. I wouldn't recommend removing support for those.

As an indication of XHTML 2.0's use, the Working Draft itself is
published in XHTML 1.0 Strict [2].  I think there are several pages on
the Web using XHTML 2.0, mostly blogs published by web standards
advocates.  At this point in development, I would say 2.0 really
shouldn't be used except for testing.

XHTML 1.0, and to a lesser extent 1.1, are fairly widely deployed. 
That doesn't necessarily mean that they're done right: XHTML should be
delivered with the MIME type application/xhtml+xml. Internet Explorer
(89.85% of the US market share [3]) doesn't process this MIME type at
all.  Most XHTML pages are delivered as text/html.  See Ian Hickson's
opinion of this situation for some insight [4].  Of course, replacing
the entire web with correct XHTML would be terrible: backwards
compatibility is still an important principle too. I would say XHTML
won't be adopted correctly until content negotiation becomes more widely
understood.

If you're looking for opinions about what makes a good rendering
engine, David is absolutely correct: tag soup is a necessity. You'll
often see this called Quirks Mode [5].

- Ed.

[1] http://www.w3.org/TR/html4/intro/sgmltut.html 
[2] view-source:http://www.w3.org/TR/xhtml2/
[3]
http://www.websidestory.com/services-solutions/datainsights/spotlight.html

[4] http://www.hixie.ch/advocacy/xhtml 
[5] http://www.mozilla.org/docs/web-developer/faq.html#layoutmode 

>>> david@dorward.me.uk 2/28/2005 3:47:35 PM >>>

On Sun, Feb 27, 2005 at 02:10:48AM +0000, Surendra Singhi wrote:

> Are there still lot of legacy web pages out there which uses SGML?

There are three common types of webpage.

* XHTML - an XML application
* HTML - an SGML application
* Tag soup - a mishmash of HTML and/or XHTML code that "works" thanks
  to error correction in browsers.
 
I suspect that most of the hacks you've found are for dealing with
webpages in the latter category.

> At one point of time I was also contemplating making a parser just
good 
> enough for eating XHTML 2.0, but then thought XHTML 2.0 is hardly
used 
> by anyone

It is rather difficult to use XHTML 2.0, given that it is currently
only a working draft.

> , and so I should support HTML 4.1.

No such language, I assume you mean HTML 4.01.

> Any opinions on this are  also welcome.

If you need the browser to work on the web, then you need to support
the tag soup that the majority of webpages are written in. Sad, but
true.

Received on Monday, 28 February 2005 21:23:10 UTC