Re: XMLNS in Inkscape and other editors from Gavin Carothers on 2009-11-22 (public-html@w3.org from November 2009)

From: Gavin Carothers <gavin@carothers.name>
Date: Sat, 21 Nov 2009 22:29:33 -0800
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Adam Barth <w3c@adambarth.com>, Boris Zbarsky <bzbarsky@mit.edu>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
Message-ID: <273883010911212229q6d2ed885v38a575f4630a6c55@mail.gmail.com>

On Sat, Nov 21, 2009 at 9:34 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Adam Barth wrote:
>>
>> On Sat, Nov 21, 2009 at 9:38 AM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
>>>
>>> On 11/20/09 8:06 PM, Gavin Carothers wrote:
>>>>
>>>> I agree, it's totally unlikely that anyone meant for the body tag not
>>>> to be in the XHTML namespace. I think it's equally unlikely that
>>>> http://www.microsoft.com/learning/en/us/Book.aspx?ID=13697&locale=en-us
>>>> is meant to be served with no content-type resulting in well...
>>>> disaster.
>>>
>>> Interesting.  The only reason that page breaks, looks like, is that the
>>> byte
>>> stream starts with the UTF-8 BOM.  If it started with "<!" browsers would
>>> treat it as HTML (or at least Gecko certainly would).
>>>
>>> If we had more cases like this I would actually propose changing the
>>> sniffing algorithm to deal, but as it is it might not be worth it.
>>
>> Interesting case.  I'm not sure if changing the sniffing algorithm
>> would cause more harm than good in this case.
>
> An argument *could* be made that the scope of sniffing should be different
> for cases where the server does not supply a media type itself.

I think I may have failed to make my point. The HTML standard can just
as easily say "A HTTP server MUST serve HTML documents as text/html."
Accepting malformed documents is great and all, but how far is too
far?

 Lets consider this Microsoft page in whole. It's served with no media
type. The only browser I've found that can (inconsistently) render it
is IE7... but the page demands that it should be rendered as IE8 does
(white page, no content). Only it doesn't really say IE8, rather it
uses a undocumented setting that uh, doesn't seem to do anything at
all. The top of the document claims it's an XHTML 1.0 document, as
such the html element declares it's namespace to be
http://www.w3.org/1999/xhtml. About half of the script tags are
clearly designed for XML, with CDATA sections wrapping their content,
the other half, no CDATA sections. The default namespace is redeclared
in the middle of the document a number of times, luckily to the same
thing each time. And of course the main bug which causes the page not
to render correctly in just about anything, a BOM marker in UTF-8...
an encoding which has no need for an endianness marker. Halfway down
the document it has a new XML deceleration, this time in UTF-16.
Validating the page fails with all XHTML validators, XML validators,
HTML4 validators, and does not render correctly (is there such a
thing?) in any user agent I'm aware of.

Attempting to fix pages like these by making browsers behave "better"
is not helpful or meaningful. To answer my own question, THIS is too
far. A document with this content, served this way should NOT render
(and doesn't).

As for the other document, Google's with the oddly namespaced body
tag, if as Sam's link points out a developer, tester, user, manager,
whatever were to look at the page when served as XHTML it's very clear
something is wrong. If however the browser fixes it, ignores it, etc,
the error (which it almost certainly is) will go unnoticed until some
standard committee looking for an example finds it.

--Gavin

Received on Sunday, 22 November 2009 06:30:07 UTC