RE: character encoding in header fields, was: SPDY Header Frames

Julian Reschke wrote:
> On 2012-07-17 16:48, James M Snell wrote:
> > Tunneling 1.1 traffic via 2.0 would likely be the easy part; it's the
> Not even that. Given an HTTP/1.1 message containing non-ASCII octets in
> header field value, you simply don't know what unicode characters to
> map
> them to.
> This is not theoretical; some UAs process UTF-8 in Content-Disposition,
> some use the installation's locale character set.
> Yes, this is a mess, but it's not clear to me how to break out of it
> without breaking *some* setups that currently "work".
> > ...
> > The one thing we need to determine is: how critical is the ability to
> > support seamless down-level conversion from 2.0 to 1.1 within a
> request?
> > Is it acceptable for us to say that while 2.0 can be used to
> transport
> > 1.1 messages, the reverse is not possible.
> > ...
> So how do you transport a 1.1 message inside 2.0 if it contains
> non-ASCII? Treat the header field value as binary?

Just to share a field note: The Python web community dealt with this exact problem recently with the advent of Python 3, which elevated Unicode quite a bit and exposed this problem more clearly to many. The chosen solution was to take the bytes-of-unknown-encoding and decode them as ISO-8859-1 (which at least won't error on any byte sequence), and leave that mess for a higher layer (which presumably would have more context) to re-encode/decode if they liked. Not a perfect solution but better than nothing.

Robert Brewer

Received on Tuesday, 17 July 2012 15:58:41 UTC