Re: [XHR2] Avoiding charset dependencies on user settings from Henri Sivonen on 2011-09-26 (public-webapps@w3.org from July to September 2011)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 26 Sep 2011 17:50:27 +0300
To: public-webapps@w3.org
Message-ID: <CAJQvAudOXje1dnKE57exQeSbt6b+yRCnwM3Gr2HtJ0dpU6EUww@mail.gmail.com>

On Mon, Sep 26, 2011 at 12:46 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Fri, Sep 23, 2011 at 1:26 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
>> On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> I agree that there are no legacy requirements on XHR here, however I
>>> don't think that that is the only thing that we should look at. We
>>> should also look at what makes the feature the most useful. A extreme
>>> counter-example would be that we could let XHR refuse to parse any
>>> HTML page that didn't pass a validator. While this wouldn't break any
>>> existing content, it would make HTML-in-XHR significantly less useful.
>>
>> Applying all the legacy text/html craziness to XHR could break current
>> use of XHR to retrieve responseText of text/html resources (assuming
>> that we want responseText for text/html work like responseText for XML
>> in the sense that the same character encoding is used for responseText
>> and responseXML).
>
> This doesn't seem to only be a problem when using "crazy" parts of
> text/html charset detection. Simply looking for <meta charset> in the
> first 1024 characters will change behavior and could cause page
> breakage.
>
> Or am I missing something?

Yes: WebKit already performs the <meta> prescan for text/html when
retrieving responseText via XHR even though it doesn't support full
HTML parsing in XHR (so responseXML is still null).
http://hsivonen.iki.fi/test/moz/xhr/charset-xhr.html

Thus, apps broken by the meta prescan would already be broken in
WebKit (unless, of course, they browser sniff in a very strange way).

And apps that wouldn't be OK with using UTF-8 as the fallback encoding
when there's no HTTP-level charset, no BOM and no <meta> in the first
1024 bytes would already by broken in Gecko.

>> Applying all the legacy text/html craziness to XHR would make data
>> loading in programs fail in subtle and hard-to-debug ways depending on
>> the browser localization and user settings. At least when loading into
>> a browsing context, there's visual feedback of character misdecoding
>> and the feedback can be attributed back to a given file. If
>> setting-dependent misdecoding happens in the XHR data loading
>> machinery of an app, it's much harder to figure out what part of the
>> system the problem should be attributed to.
>
> Could you provide more detail here. How are you imagining this data
> being used such that it's not being displayed to the user.
>
> I.e. can you describe an application that would break in a non-visual
> way and where it would be harder to detect where the data originated
> from compared to for example <iframe> usage.

If a piece of text came from XHR and got injected into a visible DOM,
it's not immediately obvious, which HTTP response it came from.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 26 September 2011 14:50:58 UTC