- From: Lachlan Hunt <lachlan.hunt@iinet.net.au>
- Date: Tue, 09 Nov 2004 09:23:35 +1100
- To: trejkaz@xaoza.net
- CC: James Cerra <jfcst24_public@yahoo.com>, www-html@w3.org
Trejkaz Xaoza wrote:
> On Sun, 7 Nov 2004 09:48, James Cerra wrote:
>> What are the recommendations for
>> identifying the document's type when MIME or HTTP is
>> not available?
>
> If it starts with "<?xml" it's an XML document. If it is then in the XHTML1
> namespace, it's XHTML1. If it's in the XHTML2 namespace, it's XHTML2.
That is not always reliable. Hixie has explained [1] in detail, about
the cases where that will not work. Although, technically, the
following description was talking about sniffing documents that were
sent as text/html, similar rules should apply where the MIME information
is not available elsewhere. I'd recommend you do as Anne already
mentioned, and use the File extension like Mozilla does.
----
+ You can't sniff for the five characters "<?xml" because:
- The <?xml ... ?> header is optional per Appendix C, and it is
recommended not to include it as it causes IE6 to trigger
quirks mode.
- SGML can also contain PIs (see the example below).
...
e.g. what language is this text/html document in?:
<?xml this is not?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
[ <!-- SYSTEM "not XHTML" --> ]>
<!-- -- -->
This is a comment. This document is not XHTML.
<html xmlns="http://www.w3.org/1999/xhtml"/>
Ok, I'm done now. -->
<html>
<title> Need a title in HTML4! </title>
<p> This is a valid HTML4 document.
</html>
...
* The HTML working group said that UAs should not do this:
http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
----
[1] http://hixie.ch/advocacy/xhtml
--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Received on Monday, 8 November 2004 22:24:16 UTC