Re: XHTML 1.0, section C14 from Benjamin Hawkes-Lewis on 2006-11-23 (www-html@w3.org from November 2006)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Thu, 23 Nov 2006 11:43:25 +0000
To: www-html <www-html@w3.org>
Cc: Shane McCarron <shane@aptest.com>, Tina Holmboe <tina@greytower.net>
Message-Id: <1164282206.6917.82.camel@galahad>
On Tue, 2006-11-21 at 14:02 -0600, Shane McCarron asserted:

> If a user agent claims to support application/xhtml+xml then you
> SHOULD send your XHTML 1 document using that media type, and all will
> be well. 

What do you mean "and all will be well"? Internet Explorer accepts
application/xhtml+xml but (as Tina Holmboe hints) usually offers to
download such documents instead of rendering them. Lynx, ELinks,
Konqueror, and even Emacs/W3 (with Raman's patch) accept
application/xhtml+xml, but their XHTML handling is only a broken
variation of their HTML handling. Safari has always accepted
application/xhtml+xml, but its support was considered dangerously buggy
until more recent WebKit builds. Mozilla accepts application/xhtml+xml
but "incremental loading of XML documents has not been implemented" so
there is no "incremental display":

http://www.mozilla.org/docs/web-developer/faq.html#accept

Your choice of language oversimplifies the complexities of the Accept
header. The acceptance of a type via the Accept header need not indicate
"support" (in the sense of ability to render), only "acceptance" for
download:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1

HTTP 1.1 seems to offer no obvious way to make a distinction between
"accepting" a media type for:

1) Direct use (e.g. text/html)

2) Indirect use via a plugin (e.g. video/quicktime)

3) Opening by the user agent in an external program

4) Opening by the user themselves in an external program

5) Download for eventual use on another system altogether

Purposes 1 to 3 (at least) are given implicit sanction by the
specification:

> A user agent might be provided with a default set of quality values
> for certain media ranges. However, unless the user agent is a closed
> system which cannot interact with other rendering agents, this default
> set ought to be configurable by the user.

Purpose 1 involves only a few types, purpose 2 significantly expands the
range, purposes 3 and 4 include a wide variety of types, and purpose 5
involves a potentially infinite number of types. Lynx and Links ignored
purpose 5 and attempted to actually list the types supported by the
user's system. Whenever this list approached accuracy, it became
extremely long, prompting complaints from users about wasted
bandwidth, privacy intrusions, and servers at Dogpile and Google
rejecting GET requests for fear of buffer overrun attacks:

http://lists.gnu.org/archive/html/lynx-dev/1997-09/msg00668.html

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=41594

http://linuxfromscratch.org/pipermail/links-list/2001-December/001589.html

http://lists.gnu.org/archive/html/lynx-dev/2004-05/msg00019.html

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254515

Indeed these problems were foreseen by the original specification:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12 

In practice, therefore, purposes 3 and 4 also require most browsers to
"accept" all types. Certainly, for purposes 3 to 5, it is appropriate
for most browsers to accept application/xhtml+xml.

But the specification also encourages user agents to specify a q
(quality) parameter for different media types in order to express their
preferences. By (probably correctly) prioritizing purposes 1 and 2 over
3, and 1, 2, and 3 over 4 and 5, sensible browsers use such expressions
of preference to send a /hint/ about support to servers. I realize you
probably had this hint in mind when you mentioned "support", but I think
it's worth clarifying anyway.

Internet Explorer isn't wrong to accept application/xhtml+xml with its
requests. What /is/ astonishingly stupid is that it expresses no
preference for text/html or against application/xhtml+xml using the q
parameter. Instead we get:

> Accept: */*

The IE Team are aware that this complicates content negotiation:

http://blogs.msdn.com/ie/archive/2005/04/27/412813.aspx#412893

http://blogs.msdn.com/ie/archive/2005/09/15/467901.aspx#468070

http://www.microsoft.com/windowsxp/expertzone/chats/transcripts/06_1012_ez_ie.mspx 

But the best they can say is that it "might" be addressed in Internet
Explorer 8:

http://blogs.msdn.com/ie/archive/2006/10/17/accept-language-header-for-internet-explorer-7.aspx#841795

Firefox's Accept header appears saner:

> text/xml,application/xml,application/xhtml
> +xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

But the message is more subtle than it first appears, since the
preference for application/xhtml+xml was only added "in order to enable
the serving of MathML to both Mozilla and IE with Apache without
scripting back when the MathPlayer plug-in for IE did not handle
application/xhtml+xml":

http://www.mozilla.org/docs/web-developer/faq.html#accept

For a time, WebKit's developers copied Internet Explorer's Accept
header; now they copy Mozilla.

You talk in terms of browsers who "lie" and Tina Holmboe of whether we
can "believe" the Accept Header. While we should treating deceptive
headers as browser bugs and pressurize developers to improve them, it
might be more helpful in the present circumstances to think of reading
the Accept header as an art of informed interpretation than in terms of
a binary opposition of faith or scepticism. Recall that server-driven
content negotiation is only a "best guess": "an origin server is not
limited to these dimensions [information provided by  the Accept,
Accept-Charset, Accept-Encoding, Accept-Language, and User-Agent
headers] and MAY vary the response based on any aspect of the request,
including information outside the request-header fields or within
extension header fields not defined by this specification."

http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12

Shane McCarron continued:

> If a user agent only claims to support text/html, then you SHOULD send
> your document using that media type. If your document is written in
> XHTML 1.0 and follows the guidelines in Appendix C, you can do this
> with the same document and you will be largely successful. In fact,
> and without any evidence to back this up, I would bet that such a
> document is almost exactly as likely to render correctly as if you
> sent it with the HTML 4.01 DOCTYPE.

Do you include or exclude styling added by CSS and behaviours added by
scripts when you talk of success and rendering? What sort of breakage is
covered by the words "largely" and "almost"? 

Also, how much would you like to bet? ;)

More seriously, /which/ guidelines need to be followed for such success?
If we follow C.14 and use an xml-stylesheet processing instruction, then
serve that document as text/html, Internet Explorer 7 renders in broken
(quirks) mode not standards mode. Here's the actual example document
from C.14:

http://www.benjaminhawkeslewis.com/www/web-design/c14-test.html

You can test which rendering mode Internet Explorer 7 is using by
entering:

> javascript:alert(document.compatMode);

into the address bar, as described at:

http://css-discuss.incutio.com/?page=RenderingMode

> I surely hope not, but if there are.... they deserve what they get.

This may be a comfortable attitude for a W3C specification writer. It
is not a luxury that the majority of would-be XHTML authors can afford,
not only on account of commercial incentives, democratic accountability,
and general politeness, but also (arguably) on the grounds of
accessibility.

WCAG 1.0 urges us to be considerate towards users "who may have an early
version of a browser, a different browser entirely, a voice browser, or
a different operating system" and repeatedly allows exceptions to its
own guidance "until most user agents readily available to their audience
include the necessary accessibility features":

http://www.w3.org/TR/WAI-WEBCONTENT/#Introduction

W3C can improve this situation, if it chooses, by:

1) Changing the note on media types to a recommendation, publishing a
standard for how web user agents should use, and web servers should
interpret, the content negotiation headers, making compliance with that
standard a checkpoint in the User Agent Accessibility Guidelines (UAAG),
and publicising the issue among public and private IT procurers.

2) Campaigning for users to adjust their own request headers to match
that standard. Microsoft and Apple could (perhaps) be pressurized into
distributing fixes via automatic updates. Or, if they proved
recalcitrant, W3C could code free utilities to fix their request headers
themselves. In the case of Internet Explorer, to alter the Accept header
is a minor registry tweak.

Pace Tina Holmboe, these growing pains are not in themselves a reason to
abandon forever XML-based markup, especially as XHTML-only user agents
are already emerging. On the contrary, the increasing diversity of the
web and its users necessitates perfection of content negotiation
methodologies and offers yet another incentive to support FOSS assistive
technology projects like Fire Vox, NVDA, and OSK-ng:

http://www.firevox.clcworld.net/

http://www.kulgan.net/nvda/

http://elgg.net/stevelee/weblog/139997.html

The net effect of such projects is that the technical and monetary
obstacles to even users with disabilities adopting more capable browsers
are increasingly being reduced to:

1) the existence of intranet web applications that rely on proprietary
features in HTML-only clients;

2) the ambitious hardware and software requirements of XHTML-capable
graphical browsers (Xubuntu and similar free *nix projects might help
solve this problem, however).

If W3C got serious about encouraging a transition to XML-based markup,
they could also explicitly include compliance with the XHTML, SVG,
XForms, and MathML specifications (currently not even Mozilla manages to
support all of these out of the box, let alone fully comply) as a
checkpoint in UAAG, and publicise that requirement with IT procurers
too.

Still, while continuing to hope for more radical action, I welcome the
decision to consider making Appendix C clearer.

--
Benjamin Hawkes-Lewis
Received on Thursday, 23 November 2006 11:50:08 UTC