Re: [XHR] Request charset is limited to UTF-8 for x-www-form-urlencoded data

On Mon, Oct 12, 2009 at 1:29 PM, Anne van Kesteren <annevk@opera.com> wrote:
> On Mon, 23 Jun 2008 01:22:20 +0200, Yaroslav <yarosla@gmail.com> wrote:
>>
>> In the current spec
>> (http://www.w3.org/TR/2008/WD-XMLHttpRequest-20080415/) I do not see
>> the possibility to POST application/x-www-form-urlencoded data with
>> charset other than UTF-8. I think this is limiting factor, which
>> should be avoided. UTF-8 is good versatile encoding but it is not
>> always practical to use it. When developing sites in Russian, for
>> example, we mainly use windows-1251 encoding, UTF-8 is rarely used as
>> it doubles network traffic.
>>
>> The spec says:
>>
>>> data is a DOMString
>>> Encode data using UTF-8 for transmission.
>>> If a Content-Type header is set using setRequestHeader() set the charset
>>> parameter of that header to UTF-8.
>>
>> In my practice application/x-www-form-urlencoded data usually comes
>> from custom javascript encoding function (as DOMString). When sending
>> it to server over XHR I use setRequestHeader('Content-type',
>> 'application/x-www-form-urlencoded; charset=windows-1251'). This
>> informs the server of the correct encoding. This all worked well until
>> FF3 followed the spec.
>>
>> I think when the user explicitly sets charset with setRequestHeader()
>> the browser should not override that. UTF-8 should be used only as
>> default.
>
> I don't quite understand how this could actually work given that the
> DOMString will have to be converted to UTF-8. It seems problems would
> already arise there.
>
> Apologies for the late reply by the way; work on XMLHttpRequest has been
> semi-dormant for a while awaiting some more feedback before attempting
> another Last Call.
>
>
> --
> Anne van Kesteren
> http://annevankesteren.nl/
>

My understanding is that when http request says:

Content-type: application/x-www-form-urlencoded; charset=windows-1251

the charset mentioned in this header applies not to the POST data
DOMString itself but to the string data that has been url-encoded. The
url-encoded string could be UTF-8 but for decoded data user must be
able to specify different charset. At least at receiving side charset
is used this way.

Yaroslav

Received on Monday, 12 October 2009 09:54:38 UTC