Re: XHTML, content type, and content negotiation

On Sat, 17 Jun 2000, Tim Taylor wrote:

> Is there any stance (official or unofficial) on how User Agents are
> supposed to process an XHTML document returned with a Content-Type of
> text/html?  What if the Content-Type is text/xml?  The XHTML 1.0 Spec is
> silent on this topic.
> 
> I found an earlier www-html discussion on the topic of XHTML and content
> type, but it doesn't appear that concensus was reached:
> 
> <http://lists.w3.org/Archives/Public/www-html-editor/1999JanMar/0086.html>
> 
> I'm specifically concerned about the following open Mozilla bug:
> 
> <http://bugzilla.mozilla.org/show_bug.cgi?id=26022>
> 
> The bug summary and description read:
> 
> "XHTML 1.0 document with text/html media-type is treated as HTML 4.0
> document.
> 
> Non-html tags in XHTML 1.0 document are ignored when the document
> lebeled with the Internet Mediatype "text/html". To be browsed old web
> browser, some XTHML documents are labeled with  "text/html", not labeled
> whith "text/xml". In new XHTML comformant browser  renders such
> documents as XHTML documents."

I think this is just muddled documentation.  for 'labeled' I would 
put 'served out as content type text/html ..." 

There is, AFAIKT, no such thing as an 'XHTML conformant browser ....'
 
> Additional comments in the bug report indicate that Mozilla doesn't
> officially support XML, so technically it's behaving correctly as an
> "old web browser" [1].  However, Mozilla /will/ one day support XML. 
> For future reference it would be helpfull to know the appropriate
> behavior.  Ideally, this would be in the XHTML spec as it's an ambiguity
> that may interfere with content authors making a smooth transition from
> HTML to XML.  Specifically people advised to start authoring their
> content as XHTML "right now" so that the full transition to XML down the
> road will be easier will be in for a surprise when their XHTML documents
> appear broken in newer browsers.

Mozilla does support XML very well - in fact, much of the browser
is written using XML as an integration language. Can you quote the 
bug report?

> Currently, I see two interpretations for the behavior of User Agents
> that support both HTML and XML:
> 
> User Agent A: ignores the Content-Type header, instead relies on the
> document content.  In this interpretation, the User Agent would treat
> the document as XML.

User agents shouldn't do this (although IE 4/5 does). Instead, they should
assume any content-type data in an HTTP header as definitive. I've never
seen anyone in 'authority' (whatever that means) definitively state
this, however.
 
> User Agent B: obeys the Content-Type header.  An XHTML document returned
> with the Content-Type text/html is treated as HTML.  An XHTML document
> returned with Content-Type text/xml is treated as XML.

This is the correct behavior, IMO.

> I prefer interpretation B.  I picture B's behavior used in conjunction
> with HTTP Content Negotiation (RFC 2295).  This is what I assumed XHTML
> was intended for all along.  I assumed that content authors could rely
> on default styling of HTML elements so long as the document was served
> as text/html.  Only if the document was served as text/xml would styling
> for all elements be necessary for proper rendering in UAs.  This
> distinction is usefull because it allows content authors to migrate from
> HTML to XML in more manageable steps, concentrating on well-formedness
> and validity first, styling for XML UAs second.
> 
> I intend to post a comment to the above mentioned Mozilla bug with any
> definitive answers, or if none are to be had, a link to this discussion,
> assuming there is one :)
> 
> Tim
> 
> [1] actually, due to some bugs, mozilla doesn't get XHTML right on
> either count at the moment.  It neither processes XHTML as HTML nor as
> XML, though at one point it did process XHTML as HTML.
>
> Tim Taylor             <uri:mailto:tim.taylor@iname.com>

My understanding is as follows:

1. A user-agent receiving data typed as being text/html should assume this
   content type information is correct, and should process the data 
   as HTML (e.g., by passing the data to its HTML processor.
   Whatever happens after that is pretty well up to the user-agent. 
   Many will toggle different parsing/processing behaviors depending 
   of the value of the DOCTYPE declaration (if any).  

   Mozilla and earlier versions of Netscape Navigator, Opera (and
   probalby other browser) behave in this way. IE 5, however, doesn't
   do this, and instead infers the type from the file content, 
   overriding any content-type sent in the HTTP response header.

   Note that the HTML processors on current browsers ignore any
   XML declaration, but fail if the DOCTYPE contains internal 
   ENTITY or ATTRIBUTE declarations... Thus the HTML processor
   can support 'simplified' XHTML, but not XHTML that uses 
   ENTITY/ATTRIBUTE declarartions, CDATA sections, etc.

2. A user agent receiving data typed as text/xml should treat the
   data as XML, and should pass it to its XML processor. This, indeed,
   is what Mozilla and IE 5 do, although neither can display the 
   data (even if it is XHTML) unless you provide a style sheet to
   define rendering properties for the elements.   This is because
   teh XML processor has no idea of default rendering properties for
   any element content. MOzilla assumes taht all elements are 
   (in the CSS vocabulary) display: inline, whereas IE 5 simply 
   displays a collapsable tree displaying the structure and content
   of the document.


If the way a file is retrieved (from disk, via ftp, etc.) doesn't
explicitly state the content-type, then the user-agent must use an
alternative mechanism, such as a filename suffix -- type database,
or file inspection.

Ian
--
Ian Graham ......................... Centre for Academic Technology
i a n   d o t   g r a h a m    a t    u t o r o n t o   d o t   c a
Information Commons                               Tel: 416-978-4548
University of Toronto                             Fax: 416-978-7705
..................... http://www.utoronto.ca/ian/ .................

Received on Tuesday, 20 June 2000 15:00:18 UTC