Re: [XHR2] Avoiding charset dependencies on user settings from Jonas Sicking on 2011-09-28 (public-webapps@w3.org from July to September 2011)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 28 Sep 2011 15:04:36 -0700
To: Anne van Kesteren <annevk@opera.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, public-webapps@w3.org
Message-ID: <CA+c2ei-WcfFo6aufVVqKOOnif0JkWiMCeuzCzW_WbWiKb5x1QQ@mail.gmail.com>

On Tue, Sep 27, 2011 at 11:10 PM, Anne van Kesteren <annevk@opera.com> wrote:
> On Wed, 28 Sep 2011 03:16:46 +0200, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> So it sounds like your argument is that we should do <meta> prescan
>> because we can do it without breaking any new ground. Not because it's
>> better or was inherently safer before webkit tried it out.
>
> It does seem better to decode resources in the manner they are encoded.

I'm not sure I understand what you're saying here. If you're simply
saying that ideally we should always decode using the correct decoder,
then I agree.

>> I'd much rather first debate what behavior we want and if we can try
>> if that is safe.
>>
>> And we always have the option of only doing HTML parsing when
>> .responseType is set to "document". That is unlikely to break a lot of
>> content. And it saves users resources as it uses less memory.
>
> I think it should have the same behavior as XML. No reason to make it harder
> for HTML.

"same as XML" is a matter of definition though. We're doing all of the
following for XML:

* Using the same charset selection for XHR loading as for <iframe> loading.
* If we don't find any explicit charset in the http headers on in the
document body, we use UTF8
* If we don't find any explicit charset in the http header, we look
for a XML PI which contains a charset

It so happens that in XML all three of these are equivalent. For HTML
that is not the case. So which are you suggesting we do (I'm assuming
not the last one :) )?

/ Jonas

Received on Wednesday, 28 September 2011 22:05:33 UTC