W3C home > Mailing lists > Public > public-webapps@w3.org > July to September 2011

Re: [XHR2] Avoiding charset dependencies on user settings

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 26 Sep 2011 17:50:27 +0300
Message-ID: <CAJQvAudOXje1dnKE57exQeSbt6b+yRCnwM3Gr2HtJ0dpU6EUww@mail.gmail.com>
To: public-webapps@w3.org
On Mon, Sep 26, 2011 at 12:46 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Fri, Sep 23, 2011 at 1:26 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
>> On Thu, Sep 22, 2011 at 9:54 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> I agree that there are no legacy requirements on XHR here, however I
>>> don't think that that is the only thing that we should look at. We
>>> should also look at what makes the feature the most useful. A extreme
>>> counter-example would be that we could let XHR refuse to parse any
>>> HTML page that didn't pass a validator. While this wouldn't break any
>>> existing content, it would make HTML-in-XHR significantly less useful.
>> Applying all the legacy text/html craziness to XHR could break current
>> use of XHR to retrieve responseText of text/html resources (assuming
>> that we want responseText for text/html work like responseText for XML
>> in the sense that the same character encoding is used for responseText
>> and responseXML).
> This doesn't seem to only be a problem when using "crazy" parts of
> text/html charset detection. Simply looking for <meta charset> in the
> first 1024 characters will change behavior and could cause page
> breakage.
> Or am I missing something?

Yes: WebKit already performs the <meta> prescan for text/html when
retrieving responseText via XHR even though it doesn't support full
HTML parsing in XHR (so responseXML is still null).

Thus, apps broken by the meta prescan would already be broken in
WebKit (unless, of course, they browser sniff in a very strange way).

And apps that wouldn't be OK with using UTF-8 as the fallback encoding
when there's no HTTP-level charset, no BOM and no <meta> in the first
1024 bytes would already by broken in Gecko.

>> Applying all the legacy text/html craziness to XHR would make data
>> loading in programs fail in subtle and hard-to-debug ways depending on
>> the browser localization and user settings. At least when loading into
>> a browsing context, there's visual feedback of character misdecoding
>> and the feedback can be attributed back to a given file. If
>> setting-dependent misdecoding happens in the XHR data loading
>> machinery of an app, it's much harder to figure out what part of the
>> system the problem should be attributed to.
> Could you provide more detail here. How are you imagining this data
> being used such that it's not being displayed to the user.
> I.e. can you describe an application that would break in a non-visual
> way and where it would be harder to detect where the data originated
> from compared to for example <iframe> usage.

If a piece of text came from XHR and got injected into a visible DOM,
it's not immediately obvious, which HTTP response it came from.

Henri Sivonen
Received on Monday, 26 September 2011 14:50:58 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:13:24 UTC