Re: character encoding in header fields, was: SPDY Header Frames

On 2012-07-17 17:57, Robert Brewer wrote:
> Julian Reschke wrote:
>> On 2012-07-17 16:48, James M Snell wrote:
>>> Tunneling 1.1 traffic via 2.0 would likely be the easy part; it's the
>>
>> Not even that. Given an HTTP/1.1 message containing non-ASCII octets in
>> header field value, you simply don't know what unicode characters to
>> map
>> them to.
>>
>> This is not theoretical; some UAs process UTF-8 in Content-Disposition,
>> some use the installation's locale character set.
>>
>> Yes, this is a mess, but it's not clear to me how to break out of it
>> without breaking *some* setups that currently "work".
>>
>>> ...
>>> The one thing we need to determine is: how critical is the ability to
>>> support seamless down-level conversion from 2.0 to 1.1 within a
>> request?
>>> Is it acceptable for us to say that while 2.0 can be used to
>> transport
>>> 1.1 messages, the reverse is not possible.
>>> ...
>>
>> So how do you transport a 1.1 message inside 2.0 if it contains
>> non-ASCII? Treat the header field value as binary?
>
> Just to share a field note: The Python web community dealt with this exact problem recently with the advent of Python 3, which elevated Unicode quite a bit and exposed this problem more clearly to many. The chosen solution was to take the bytes-of-unknown-encoding and decode them as ISO-8859-1 (which at least won't error on any byte sequence), and leave that mess for a higher layer (which presumably would have more context) to re-encode/decode if they liked. Not a perfect solution but better than nothing.

That's also what APIs like XMLHTTPRequest and the servlet API are doing.

Best regards, Julian

Received on Tuesday, 17 July 2012 16:39:05 UTC