XHTML, content type, and content negotiation from Tim Taylor on 2000-06-17 (www-html@w3.org from June 2000)

From: Tim Taylor <tim.taylor@iname.com>
Date: Sat, 17 Jun 2000 01:01:27 -0400 (EDT)
To: www-html@w3.org
Message-ID: <394B0600.9F2DB2F7@iname.com>
Is there any stance (official or unofficial) on how User Agents are
supposed to process an XHTML document returned with a Content-Type of
text/html?  What if the Content-Type is text/xml?  The XHTML 1.0 Spec is
silent on this topic.

I found an earlier www-html discussion on the topic of XHTML and content
type, but it doesn't appear that concensus was reached:

<http://lists.w3.org/Archives/Public/www-html-editor/1999JanMar/0086.html>

I'm specifically concerned about the following open Mozilla bug:

<http://bugzilla.mozilla.org/show_bug.cgi?id=26022>

The bug summary and description read:

"XHTML 1.0 document with text/html media-type is treated as HTML 4.0
document.

Non-html tags in XHTML 1.0 document are ignored when the document
lebeled with the Internet Mediatype "text/html". To be browsed old web
browser, some XTHML documents are labeled with  "text/html", not labeled
whith "text/xml". In new XHTML comformant browser  renders such
documents as XHTML documents."

Additional comments in the bug report indicate that Mozilla doesn't
officially support XML, so technically it's behaving correctly as an
"old web browser" [1].  However, Mozilla /will/ one day support XML. 
For future reference it would be helpfull to know the appropriate
behavior.  Ideally, this would be in the XHTML spec as it's an ambiguity
that may interfere with content authors making a smooth transition from
HTML to XML.  Specifically people advised to start authoring their
content as XHTML "right now" so that the full transition to XML down the
road will be easier will be in for a surprise when their XHTML documents
appear broken in newer browsers.

Currently, I see two interpretations for the behavior of User Agents
that support both HTML and XML:

User Agent A: ignores the Content-Type header, instead relies on the
document content.  In this interpretation, the User Agent would treat
the document as XML.

User Agent B: obeys the Content-Type header.  An XHTML document returned
with the Content-Type text/html is treated as HTML.  An XHTML document
returned with Content-Type text/xml is treated as XML.

I prefer interpretation B.  I picture B's behavior used in conjunction
with HTTP Content Negotiation (RFC 2295).  This is what I assumed XHTML
was intended for all along.  I assumed that content authors could rely
on default styling of HTML elements so long as the document was served
as text/html.  Only if the document was served as text/xml would styling
for all elements be necessary for proper rendering in UAs.  This
distinction is usefull because it allows content authors to migrate from
HTML to XML in more manageable steps, concentrating on well-formedness
and validity first, styling for XML UAs second.

I intend to post a comment to the above mentioned Mozilla bug with any
definitive answers, or if none are to be had, a link to this discussion,
assuming there is one :)

Tim

[1] actually, due to some bugs, mozilla doesn't get XHTML right on
either count at the moment.  It neither processes XHTML as HTML nor as
XML, though at one point it did process XHTML as HTML.

-- 
Tim Taylor             <uri:mailto:tim.taylor@iname.com>

Cool URIs Don't Change <uri:http://www.w3.org/Provider/Style/URI>
URL as UI              <uri:http://www.useit.com/alertbox/990321.html>
Received on Tuesday, 20 June 2000 13:28:45 UTC