Re: [XHR] Some comments on "charset" in the Content-Type header from Anne van Kesteren on 2009-10-09 (public-webapps@w3.org from October to December 2009)

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 09 Oct 2009 13:26:47 +0200
To: "Boris Zbarsky" <bzbarsky@mit.edu>
Cc: "WebApps WG" <public-webapps@w3.org>
Message-ID: <op.u1i52xsg64w2qv@annevk-t60>

On Thu, 08 Oct 2009 19:03:03 +0200, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> On 10/8/09 11:21 AM, Anne van Kesteren wrote:
>> I realize this discussion was well over a year ago. I imagine Gecko has
>> meanwhile dealt with the compatibility issues so we can probably keep it
>> in the specification if you can confirm that.
>
> I think the practice has been that some sites have changed what they're  
> doing and some intranets are not upgrading Firefox because they use  
> closed-source black-box server apps with broken header parsing...  Not  
> sure that counts as having dealt with the compatibility issues for  
> purposes of other implementers, from my past experiences.

Sigh.

Are you willing to modify Gecko to a model where if Content-Type has been  
set by setRequestHeader() the charset parameter is changed if present to  
the encoding used and is not set when not present. And where if  
Content-Type has not been set setRequestHeader() it is set by the user  
agent including charset parameter?

Specifically, if the application does:

   setRequestHeader("content-type", "foo/bar")

or some such you'll leave it alone.

>> Could you please also comment on the text in
>>
>> http://dev.w3.org/2006/webapi/XMLHttpRequest/#the-send-method
>>
>> to see if what it says now regarding the Content-Type header and charset
>> parameter is correct?
>
> It is not.  Specifically, if the content-type parameter is already  
> present and its value matches, case insensitively, the value we would  
> like to set, the existing value should be kept.  Specifically, replacing  
> "utf-8" with "UTF-8" breaks some sites (e.g. anything using Google Web  
> Toolkit).

Ok, will change.

>> Something I would like to change is to remove the dependency on
>> document.inputEncoding and simply always encode documents as UTF-8 too.
>> Can we do that? This would be better for off-the-shelf XML parsers on
>> the server which might only be able to deal with UTF-8 and UTF-16.
>> Especially in the cross-origin scenario.
>
> Doesn't that change the handling of URIs in the document, specifically  
> the situations where URI escapes are to be interpreted in the document  
> encoding?

Actually, that would be the case for characters that are not escaped using  
percent-encoding. Not entirely sure how you'd end up with such a document  
but I suppose you can be sending some <iframe> or even the current  
document over the wire. I guess I'll leave this in but define it in terms  
of the "document's character encoding" from HTML5 and default to UTF-8  
rather than UTF-16.

-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Friday, 9 October 2009 11:27:45 UTC