- From: Karl Ove Hufthammer <huftis@bigfoot.com>
- Date: Tue, 20 Jun 2000 21:46:43 +0200
- To: "Tim Taylor" <tim.taylor@iname.com>, <www-html@w3.org>
----- Original Message ----- From: "Tim Taylor" <tim.taylor@iname.com> To: <www-html@w3.org> Sent: Saturday, June 17, 2000 7:01 AM Subject: XHTML, content type, and content negotiation | Is there any stance (official or unofficial) on how User Agents are | supposed to process an XHTML document returned with a Content-Type of | text/html? What if the Content-Type is text/xml? The XHTML 1.0 Spec is | silent on this topic. Here's my strictly unofficial opinion. Speaking of behalf of nobody but myself: The browser should treat XHTML content served as 'text/html' as XML. Some points: * Everybody who writes XHTML, do it "on purpose". * They *want* strict parsing. * The only reason they use 'text/html' is for their web pages to be backwards-compatible (and because the XHTML recommendation tells them to!). The HTML (4.01, not XHTML 1.0) Recommendation doesn't say what user agents should do when they encounter "bad" HTML. The closest thing I could find, was Appendix B, which only talkes about unknown elements and attributes. These should be ignored (but the content rendered). A browser should never reject: <p>foo <p>blaa (Not "well-formed" but legal.) But it could, in theory, refuse a document with HTML like this ("tag soup"): <p><b>foo bar <i>baz</b> xyzzy</p> IMO, the world (wide web) would be a much better place if all browsers acted this way (from the start of -- it's too late now). Doing this today, would of course be a very stupid thing to do; the browser wouldn't render most pages out there. *But*, when it comes to XHTML, it's *not* a stupid thing to do. XHTML *needs* strict parsing. The XHTML specification tells us to use 'text/html'. This is a good thing, since it lets us use XHTML, but the pages will still be backwards-compatible with older user agents. Newer user agents, which "know" about XHTML, should still treat the content as XML. There's no reason not to, since all XHTML web pages will be valid -- there's no reason to be "backwards-compatible" with malformed documents. Browsers *must* refuse to render not well-formed XML. The latest HTML standard is XHTML 1.0 -- HTML implemented as XML. Browsers should refuse to render not well-formed XHTML, even when it's marked as 'text/html'. | I'm specifically concerned about the following open Mozilla bug: | | <http://bugzilla.mozilla.org/show_bug.cgi?id=26022> | | The bug summary and description read: | | "XHTML 1.0 document with text/html media-type is treated as HTML 4.0 | document. | | Non-html tags in XHTML 1.0 document are ignored when the document | lebeled with the Internet Mediatype "text/html". To be browsed old web | browser, some XTHML documents are labeled with "text/html", not labeled | whith "text/xml". In new XHTML comformant browser renders such | documents as XHTML documents." | | Additional comments in the bug report indicate that Mozilla doesn't | officially support XML, That's not right. It fully supports XML. Actually, Mozilla's user interface is written i XUL, which is a XML application. | so technically it's behaving correctly as an | "old web browser" [1]. However, Mozilla /will/ one day support XML. And that's now! :) | For future reference it would be helpfull to know the appropriate | behavior. Ideally, this would be in the XHTML spec as it's an ambiguity | that may interfere with content authors making a smooth transition from | HTML to XML. Specifically people advised to start authoring their | content as XHTML "right now" so that the full transition to XML down the | road will be easier will be in for a surprise when their XHTML documents | appear broken in newer browsers. Yup. That's why we need strict parsing on all XHTML documents. | Currently, I see two interpretations for the behavior of User Agents | that support both HTML and XML: | | User Agent A: ignores the Content-Type header, instead relies on the | document content. In this interpretation, the User Agent would treat | the document as XML. I prefer this. This ensures that the document served as 'text/html', properly rendered in a browser which supports XHTML, will be rendered in the same browser when it's served as 'text/xml' too, in addition to making life much easier for web authors (since they can easily check if their documents are valid). | User Agent B: obeys the Content-Type header. An XHTML document returned | with the Content-Type text/html is treated as HTML. An XHTML document | returned with Content-Type text/xml is treated as XML. | | I prefer interpretation B. I picture B's behavior used in conjunction | with HTTP Content Negotiation (RFC 2295). This is what I assumed XHTML | was intended for all along. I assumed that content authors could rely | on default styling of HTML elements so long as the document was served | as text/html. Only if the document was served as text/xml would styling | for all elements be necessary for proper rendering in UAs. Why? The reason we have HTML and XHTML is to provide a fixed set of elements, which the browser can render in a meaningful way. A speech browser can use different voice for headings (e.g. 'h1' elements), a graphical browser will often render with a bigger font-size (and perhaps in a different colour). Some browser will be able to automatically generate a table of contents based on heading elements. User can use user style sheets to make sure all documents are rendered in a way they like (e.g. all headings should be dark blue on a white background). This is the power of a fixed set of elements. If browsers didn't use a default "ua.css", there's really not much reason to use XHTML instead of just your own, private XML element set. XML only says something about how a document should be written. The various standards says how they should be rendered. For example, the following MathML: <apply> <fn> <ci>f</ci> </fn> <ci>x</ci> </apply> should be rendered as f(x) (or perhaps spoken!). SVG is a XML application which defines how SVG should be rendered, i.e. as ima ges. In the same way, XHTML defines how HTML should be rendered (the way it's defined in HTML 4.01). -- Karl Ove Hufthammer
Received on Tuesday, 20 June 2000 15:53:27 UTC