HTML, XHTML, text/html, SGML, and XML

Christoph Schneegans wrote to www-validator:

> Today's so-called HTML user agents don't support arbitrary HTML.

If you mean that the user agents don't support SGML, you're not breaking 
any news.

> For example, this perfectly valid HTML 4.01 document is rendered in a
> different way in IE, Firefox und Opera:
> 
>   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
>   <html>
>   <head><title></title></head>
>   <body>
>   <p><a href="http://www.w3.org/"><img alt="W3C" src="http://www.w3.org/Icons/w3c_main"</a> Should this text be underlined?</p>
>   </body>
>   </html>

Yes, that's perfectly valid HTML 4.01. More accurately, that's perfectly 
valid SGML that happens to use a document type declaration that 
identifies, by a formal public identifier that originated in the 
specification of HTML 4.01, its external subset, which subset is part of 
the specification of HTML 4.01.

SGML and the Internet media type "text/html" are not the same. They are 
not equivalent. They are not even much compatible.

One can make the case that HTML 4.01 is an application of SGML, in spite 
of sins by the former in the sight of the latter. If that is the case, 
then the degree to which one treats HTML 4.01 as SGML is the degree to 
which one is incompatible with Internet media type "text/html" in the 
World Wide Web.

> On the other hand, existing HTML user agents do support XHTML that
> complies with Appendix C.

I see what you mean, but your statement is misleading. Existing HTML 
user agents do not support XHTML or support XHTML poorly. What the user 
agents support is Internet media type "text/html". And it just so 
happens that some XHTML can pass as Internet media type "text/html".

The degree to which one treats a document as XHTML, that is, as XML, is 
the degree to which one is incompatible with Internet media type 
"text/html" in the World Wide Web.

I have for a while been mulling a rant to explain the incompatibility of 
XHTML and Internet media type "text/html". If anybody agrees with my 
position and has a spare cycle, I would welcome such a person to write 
such a rant in my stead. The ESW wiki awaits.

>> so the benefits can be gained for writing XHTML and for storing data
>> in that format, and it can be transformed to HTML before serving to
>> clients.
> 
> Waste of resources.

Whether transforming server-side XHTML to HTML for publication is a 
waste of resources depends on what one does with the XHTML server-side. 
If one is using run-of-the-mill XHTML 1.0, I agree that transforming to 
HTML is a waste of resources. If one is using ruby markup and embedding 
MathML (or what have you), then the transformation is necessary for 
interoperation with a wide audience.

This message is, I admit, off-topic for www-validator. I wrote to 
www-validator to maintain thread. Which forum is best for further 
discussion?

Received on Monday, 9 October 2006 13:08:21 UTC