W3C home > Mailing lists > Public > www-talk@w3.org > March to April 2001

Re: text/html for xml extensions of XHTML

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 30 Apr 2001 17:09:53 -0700 (Pacific Daylight Time)
To: "William F. Hammond" <hammond@csc.albany.edu>
cc: <www-talk@w3.org>, <mozilla-mathml@mozilla.org>
Message-ID: <Pine.WNT.4.31.0104301639490.800-100000@HIXIE.netscape.com>
On Mon, 30 Apr 2001, William F. Hammond wrote:
>
> XHTML may be served through http as "text/html" according to the XHTML
> specification
>
>          http://www.w3.org/TR/2000/REC-xhtml1-20000126
>
> if it conforms to Appendix C on compatibility with older user agents,
> as provided in section 5.1.

The reasoning behind this, as I understand it, being that markup sent as
text/html should be renderable on older user agents. This, in my opinion
of course, implies that anything that is _not_ compatible with older user
agents should _not_ be sent as text/html. This would, I contend, include
markup from other namespaces.


> For a user agent, such as Mozilla (http://www.mozilla.org/), that
> houses an xml parser [...]

Other examples being IE or Opera, of course.


> Some assert that any XHTML document with namespace extensions must
> be served as "text/xml" and must not be served as "text/html".

That view point is consistent with the statement in section 5.1 of XHTML
1.0, which only says that XHTML documents which follow guidelines intended
to be backwardsly compatible may be sent as text/html. Since the use of
XML namespaces is not compatible with older user agents, a logical
assumption is that documents using such extensions should not be sent as
text/html, although the spec makes no recommendation about MIME labeling
of other XHTML documents".

Furthermore, if the content is not compatible with older user agents,
there is no reason to _want_ to send the objects as text/html. Conforming
XML parsers that handle XHTML with content from other namespaces will
handle it when sent as text/xml.


> A user agent with an xml parser need look no further than the first
> instance tag.

Which of course might come after any number of string that appear
to be instance tags hidden in comments, processing instructions,
and internal subsets, all of which, in browsers wishing to support
existing text/html tag soup, should be ignored.


> Thus, a user agent with an xml parser should call that parser if any
> of the following is true:
>
> 1.  The instance is served through http as "text/xml".

Agreed.


> 2.  The instance is served through http as "text/html" and any of
>     the following is true:
>
>     a.  The instance begins with the string "<?xml" .

Nope. Here is a document that is valid text/html, but non-well-formed
text/xml, and which should therefore be sent through the HTML parser:

   <?xml this is not?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
   <!-- -- -->
     This is a comment. This document is not XHTML.
     <html xmlns="http://www.w3.org/1999/xhtml"/>
     Ok, I'm done now. -->
   <html>
    <title> Need a title in HTML! </title>
    <p> This is a valid HTML document.
   </html>

See:
   http://www.damowmow.com/mozilla/html-not-xml.html (the document)
   http://validator.w3.org/check?uri=http%3A%2F%2Fwww.damowmow.com%2Fmozilla%2Fhtml-not-xml.html&doctype=Inline

Note that Mozilla renders this document correctly.


>     b.  The instance has a string matching the case-sensitive pattern
>         "<!DOCTYPE html PUBLIC .*XHTML" before the first document
>         instance tag.

Hmm, the valid HTML document above also matches that string.


>     c.  The first document instance tag is an open tag for the element
>         "html" (all lower case) with a value specified for the attribute
>         "xmlns".

How do you know it is the first instance tag without having a full XML
parser to skip past PIs, comments, internal subsets, and the like?

-- 
Ian Hickson                                            )\     _. - ._.)   fL
Invited Expert, CSS Working Group                     /. `- '  (  `--'
The views expressed in this message are strictly      `- , ) -  > ) \
personal and not those of Netscape or Mozilla. ________ (.' \) (.' -' ______
Received on Monday, 30 April 2001 20:07:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:25 GMT