W3C home > Mailing lists > Public > www-html@w3.org > November 2004

Re: Identifying (X)HTML without MIME

From: Lachlan Hunt <lachlan.hunt@iinet.net.au>
Date: Tue, 09 Nov 2004 09:23:35 +1100
Message-ID: <418FF1E7.1010609@iinet.net.au>
To: trejkaz@xaoza.net
CC: James Cerra <jfcst24_public@yahoo.com>, www-html@w3.org

Trejkaz Xaoza wrote:
> On Sun, 7 Nov 2004 09:48, James Cerra wrote:
>> What are the recommendations for
>> identifying the document's type when MIME or HTTP is 
>> not available?
> If it starts with "<?xml" it's an XML document.  If it is then in the XHTML1 
> namespace, it's XHTML1.  If it's in the XHTML2 namespace, it's XHTML2.

That is not always reliable.  Hixie has explained [1] in detail, about 
the cases where that will not work.  Although, technically, the 
following description was talking about sniffing documents that were 
sent as text/html, similar rules should apply where the MIME information 
is not available elsewhere.  I'd recommend you do as Anne already 
mentioned, and use the File extension like Mozilla does.

     + You can't sniff for the five characters "<?xml" because:

       - The <?xml ... ?> header is optional per Appendix C, and it is
         recommended not to include it as it causes IE6 to trigger
         quirks mode.

       - SGML can also contain PIs (see the example below).


    e.g. what language is this text/html document in?:

       <?xml this is not?>
       <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
           [ <!-- SYSTEM "not XHTML" --> ]>
       <!-- -- -->
         This is a comment. This document is not XHTML.
         <html xmlns="http://www.w3.org/1999/xhtml"/>
         Ok, I'm done now. -->
        <title> Need a title in HTML4! </title>
        <p> This is a valid HTML4 document.


  * The HTML working group said that UAs should not do this:

[1] http://hixie.ch/advocacy/xhtml
Lachlan Hunt
http://GetFirefox.com/    Rediscover the Web
http://SpreadFirefox.com/   Igniting the Web
Received on Monday, 8 November 2004 22:24:16 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:09 UTC