Re: Detecting XHTML from Henri Sivonen on 2011-01-05 (public-html-xml@w3.org from January 2011)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 5 Jan 2011 10:42:35 +0200
To: public-html-xml@w3.org
Message-Id: <FA63E221-3010-4338-9577-8BCBD79606A7@iki.fi>
On Jan 4, 2011, at 21:55, John Cowan wrote:

> Henri Sivonen scripsit:
> 
>> What about an HTML.next that's 100% convergent with XML
>> and has a mode switch for opting into? It turns out that we
>> already have that! It's called XHTML5 and the mode switch is the
>> Content-Type: application/xhtml+xml HTTP header. Even better than some
>> yet-to-be-defined HTML.next mode, it's already supported by the latest
>> versions of the top browsers (if you count IE9 as the latest version
>> of IE).
> 
> This troubles me, because it means that in order for XHTML5 to be viewed
> in a browser as the author intended, it must be:
> 
> 1) served from an HTTP server

Or if you use a file: URL, the file must have a .xhtml file name extension.

> 2) on which the author can control the Content-Type: settings.

Or on which there's a default mapping for .xhtml and the author can name the file with a .xhtml extension.

(Note that if the content isn't in a flat file but is produced by a server-side program that gets to write HTTP responses, such programs typically can write the Content-Type header, too.)

> If either of these conditions is violated, the XHTML will be processed
> as HTML.

These days, considering that Apache comes with a default mapping for .xhtml, I think the harder part is managing to make the file contents well-forming than to name the file with a .xhtml file name extension (or to call whatever it is you need to call to write the Content-Type header in your server-side programming environment).

> That's bad, and there should be a document-internal flag that
> forces the HTML parser to use the XHTML parser instead.  The obvious
> candidate is an XML declaration, but I suppose you will tell me that
> there are N tag-soup documents with XML declarations on them.

This was considered back in 2000. However, by the time Mozilla/Netscape got around to considering it in August 2000 (http://groups.google.com/group/netscape.public.mozilla.layout/browse_thread/thread/eec76019d8e365f5/56526a09488d6e9e), the infamous Appendix C (which got published as a part of a REC already in January 2000) had already started having its corrosive effect and even http://www.xml.com/ was serving text/html content that started with an XML declaration but wouldn't have worked if processed through the XML code path...

Obviously, the occurrences of such content have exploded on the Web since 2000.

Furthermore, in September 2000, the XHTML2 WG (back then called the HTML WG) explicitly decided to require browsers to use the HTML code path for text/html content that has XHTML traits (http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html). Regardless of whether you think the WG/W3C was right back then, it's way too late to second guess that decision now and the way software has been written for a decade is that the HTTP-level Content-Type is the switch.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 5 January 2011 08:43:39 UTC