Re: Sniffing XHTML sent as text/html from Steven Pemberton on 2000-09-13 (www-html@w3.org from September 2000)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Wed, 13 Sep 2000 23:11:07 +0200
To: "L. David Baron" <dbaron@fas.harvard.edu>
Cc: <www-html@w3.org>
Message-ID: <009501c01dc7$42e6e720$0200a8c0@steven>
David,

The HTML WG has discussed this issue: the intention was to allow old
(HTML-only) browsers to accept XHTML 1.0 documents by following the
guidelines, and serving them as text/html. Therefore, documents served as
text/html should be treated as HTML and not as XHTML. There should be no
sniffing of text/html documents to see if they are really XHTML.

Note that there are some semantic differences between HTML documents and
XHTML documents: there are specific CSS rules that only apply to HTML (and
not XHTML), and the DOM has different effects (for instance, the element
names are returned in uppercase for HTML, and lower case for XHTML).

Best wishes,

Steven Pemberton
Chair, W3C HTML WG

----- Original Message -----
From: "L. David Baron" <dbaron@fas.harvard.edu>
To: <w3c-html-wg@w3.org>
Cc: <dbaron@fas.harvard.edu>
Sent: Saturday, August 26, 2000 2:02 AM
Subject: Sniffing XHTML sent as text/html


>
> Mozilla has hit upon a problem.
>
> According to the XHTML specification, section 5.1:
>
> # XHTML Documents ... may be labeled with the Internet Media Type
> # "text/html" [...]
>
> This means that if a modern UA receives text/html, it has to work out
> whether it should treat it as legacy HTML, or well formed XML.  (It is
> wrong to parse XHTML as legacy HTML, since then you do not report
> well-formedness errors, which is a violation of the XML spec and XHTML
> spec. [1])
>
> How should one decide which codepath to take?  This issue was raised on
> www-html last month [2], but no satisfactory solution has yet been
> given.  Here are the existing options, the problems, and the reasons
> for these options:
>
> 1. Branch based on the presence or absence of the XML declaration
>
>   PRO:
>
>     1. Very easy, since the XML declaration must be the first thing
>        in the document.  It is something that could be done without
>        instantiating a parser.
>
>   CON:
>
>     1. The XML declaration is not required in XML or XHTML.
>
>     2. Requiring an XML declaration for XHTML sent as text/html would
>        mess up some existing user agents. [3]
>
> 2. Branch based on the DOCTYPE.
>
>    PRO:
>
>      1. DOCTYPE is mandatory in strictly conforming XHTML documents.
>
>    CON:
>
>      1. (To negate 2-PRO-1) The conformance requirements for user
>         agents apply to all XHTML documents, not just those that are
>         strictly conforming, so DOCTYPE is not mandatory for all
>         documents to which the UA conformance requirements apply.
>
>      2. What counts as a DOCTYPE could be influenced by the presence
>         of comments before the DOCTYPE.  Since comments differ
>         between SGML and XML, the comment parsing mode would have to
>         be a default, and that default could determine the result.
>
>      3. Determining the DOCTYPE requires skipping comments and
>         processing instructions, so it is not easy to do without
>         significant parsing.
>
>      4. It limits the ability of standards bodies including the W3C
>         to choose arbitrary FPIs for future versions of XHTML, since
>         a set of FPIs would have to be designated as XHTML.  Any
>         such set would also limit the choices of others.
>
> 3. Branch based on xmlns attached to the root element
>
>    PRO:
>
>      1. xmlns is required in strictly conforming XHTML documents
>
>    CON:
>
>      1. same as 2-CON-1 (particularly since it is declared #FIXED in
>         the DTD so a validating parser doesn't require it)
>
>      2. same as 2-CON-2
>
>      3. same as 2-CON-3
>
> So, how should Mozilla detect whether a text/html document is XHTML
> or not?
>
> David Baron (dbaron@fas.harvard.edu)
> Ian Hickson (ianh@netscape.com)
>
> [1] http://www.w3.org/TR/xhtml1/#uaconf
> [2] http://lists.w3.org/Archives/Public/www-html/2000Jul/0085.html
> [3] http://lists.w3.org/Archives/Public/www-html/1999May/0012.html
>
>
Received on Wednesday, 13 September 2000 17:18:09 UTC