Re: XMLNS in Inkscape and other editors from Julian Reschke on 2009-11-22 (public-html@w3.org from November 2009)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 22 Nov 2009 14:27:14 +0100
To: Adam Barth <w3c@adambarth.com>
CC: Boris Zbarsky <bzbarsky@mit.edu>, Gavin Carothers <gavin@carothers.name>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
Message-ID: <4B093C32.7000404@gmx.de>

Adam Barth wrote:
> ...
> That is, in fact, how the sniffing algorithm works:
> 
> http://tools.ietf.org/html/draft-abarth-mime-sniff

Thanks for the reminder.

> The algorithm tolerates leading white space, but not leading BOMs.

Is there a particular reason why the BOM is not tolerated, given 
<http://www.w3.org/TR/REC-xml/#sec-guessing>?

Gavin Carothers wrote:
 > I think I may have failed to make my point. The HTML standard can just
 > as easily say "A HTTP server MUST serve HTML documents as text/html."
 > Accepting malformed documents is great and all, but how far is too
 > far?

That's a pointless requirement. In general, there may be cases where a 
server doesn't know the content type, and forcing it to label content of 
unknown type actually is harmful (and leads frequently to mislabeled 
content, which in turn historically is the *cause* for UAs doing content 
sniffing overriding the mime type).

 >  Lets consider this Microsoft page in whole. It's served with no media
 > type. The only browser I've found that can (inconsistently) render it
 > is IE7... but the page demands that it should be rendered as IE8 does
 > (white page, no content). Only it doesn't really say IE8, rather it
 > uses a undocumented setting that uh, doesn't seem to do anything at
 > all. The top of the document claims it's an XHTML 1.0 document, as
 > such the html element declares it's namespace to be
 > http://www.w3.org/1999/xhtml. About half of the script tags are
 > clearly designed for XML, with CDATA sections wrapping their content,
 > the other half, no CDATA sections. The default namespace is redeclared
 > in the middle of the document a number of times, luckily to the same
 > thing each time. And of course the main bug which causes the page not
 > to render correctly in just about anything, a BOM marker in UTF-8...
 > an encoding which has no need for an endianness marker. Halfway down
 > the document it has a new XML deceleration, this time in UTF-16.

There's no question that that particular page is broken.

On the other hand, the UTF-8 BOM serves a very useful purpose 
(auto-detection of the character set when not out-of-band encoding 
information is available), and therefore I would expect a content 
sniffing algorithm to take it int account.

 > Validating the page fails with all XHTML validators, XML validators,
 > HTML4 validators, and does not render correctly (is there such a
 > thing?) in any user agent I'm aware of.

Let's ignore "correctly" for a second -- it *does* render in IE8, Opera 
and Safari on Win7. So the two latter UAs seem to differ in the way they 
do content-sniffing from Firefox.

 > ...

Best regards, Julian

Received on Sunday, 22 November 2009 13:28:00 UTC