Re: Identifying (X)HTML without MIME

Trejkaz Xaoza wrote:
> On Sun, 7 Nov 2004 09:48, James Cerra wrote:
>> What are the recommendations for
>> identifying the document's type when MIME or HTTP is 
>> not available?
> 
> If it starts with "<?xml" it's an XML document.  If it is then in the XHTML1 
> namespace, it's XHTML1.  If it's in the XHTML2 namespace, it's XHTML2.

That is not always reliable.  Hixie has explained [1] in detail, about 
the cases where that will not work.  Although, technically, the 
following description was talking about sniffing documents that were 
sent as text/html, similar rules should apply where the MIME information 
is not available elsewhere.  I'd recommend you do as Anne already 
mentioned, and use the File extension like Mozilla does.

----
     + You can't sniff for the five characters "<?xml" because:

       - The <?xml ... ?> header is optional per Appendix C, and it is
         recommended not to include it as it causes IE6 to trigger
         quirks mode.

       - SGML can also contain PIs (see the example below).

    ...

    e.g. what language is this text/html document in?:

       <?xml this is not?>
       <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
           [ <!-- SYSTEM "not XHTML" --> ]>
       <!-- -- -->
         This is a comment. This document is not XHTML.
         <html xmlns="http://www.w3.org/1999/xhtml"/>
         Ok, I'm done now. -->
       <html>
        <title> Need a title in HTML4! </title>
        <p> This is a valid HTML4 document.
       </html>

  ...

  * The HTML working group said that UAs should not do this:
       http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
----

[1] http://hixie.ch/advocacy/xhtml
-- 
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/    Rediscover the Web
http://SpreadFirefox.com/   Igniting the Web

Received on Monday, 8 November 2004 22:24:16 UTC