- From: Gavin Carothers <gavin@carothers.name>
- Date: Sat, 21 Nov 2009 22:29:33 -0800
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Adam Barth <w3c@adambarth.com>, Boris Zbarsky <bzbarsky@mit.edu>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
On Sat, Nov 21, 2009 at 9:34 PM, Julian Reschke <julian.reschke@gmx.de> wrote: > Adam Barth wrote: >> >> On Sat, Nov 21, 2009 at 9:38 AM, Boris Zbarsky <bzbarsky@mit.edu> wrote: >>> >>> On 11/20/09 8:06 PM, Gavin Carothers wrote: >>>> >>>> I agree, it's totally unlikely that anyone meant for the body tag not >>>> to be in the XHTML namespace. I think it's equally unlikely that >>>> http://www.microsoft.com/learning/en/us/Book.aspx?ID=13697&locale=en-us >>>> is meant to be served with no content-type resulting in well... >>>> disaster. >>> >>> Interesting. The only reason that page breaks, looks like, is that the >>> byte >>> stream starts with the UTF-8 BOM. If it started with "<!" browsers would >>> treat it as HTML (or at least Gecko certainly would). >>> >>> If we had more cases like this I would actually propose changing the >>> sniffing algorithm to deal, but as it is it might not be worth it. >> >> Interesting case. I'm not sure if changing the sniffing algorithm >> would cause more harm than good in this case. > > An argument *could* be made that the scope of sniffing should be different > for cases where the server does not supply a media type itself. I think I may have failed to make my point. The HTML standard can just as easily say "A HTTP server MUST serve HTML documents as text/html." Accepting malformed documents is great and all, but how far is too far? Lets consider this Microsoft page in whole. It's served with no media type. The only browser I've found that can (inconsistently) render it is IE7... but the page demands that it should be rendered as IE8 does (white page, no content). Only it doesn't really say IE8, rather it uses a undocumented setting that uh, doesn't seem to do anything at all. The top of the document claims it's an XHTML 1.0 document, as such the html element declares it's namespace to be http://www.w3.org/1999/xhtml. About half of the script tags are clearly designed for XML, with CDATA sections wrapping their content, the other half, no CDATA sections. The default namespace is redeclared in the middle of the document a number of times, luckily to the same thing each time. And of course the main bug which causes the page not to render correctly in just about anything, a BOM marker in UTF-8... an encoding which has no need for an endianness marker. Halfway down the document it has a new XML deceleration, this time in UTF-16. Validating the page fails with all XHTML validators, XML validators, HTML4 validators, and does not render correctly (is there such a thing?) in any user agent I'm aware of. Attempting to fix pages like these by making browsers behave "better" is not helpful or meaningful. To answer my own question, THIS is too far. A document with this content, served this way should NOT render (and doesn't). As for the other document, Google's with the oddly namespaced body tag, if as Sam's link points out a developer, tester, user, manager, whatever were to look at the page when served as XHTML it's very clear something is wrong. If however the browser fixes it, ignores it, etc, the error (which it almost certainly is) will go unnoticed until some standard committee looking for an example finds it. --Gavin
Received on Sunday, 22 November 2009 06:30:07 UTC