Re: character encoding in header fields, was: SPDY Header Frames

On Tue, Jul 17, 2012 at 8:57 AM, Robert Brewer <fumanchu@aminus.org> wrote:

> Julian Reschke wrote:
> >[snip]
> > So how do you transport a 1.1 message inside 2.0 if it contains
> > non-ASCII? Treat the header field value as binary?
>
> Just to share a field note: The Python web community dealt with this exact
> problem recently with the advent of Python 3, which elevated Unicode quite
> a bit and exposed this problem more clearly to many. The chosen solution
> was to take the bytes-of-unknown-encoding and decode them as ISO-8859-1
> (which at least won't error on any byte sequence), and leave that mess for
> a higher layer (which presumably would have more context) to
> re-encode/decode if they liked. Not a perfect solution but better than
> nothing.
>
>
>
I suspect that we'll simply have to take the same approach here. I think I
would lean more towards not doing any decoding at all. Simply treat the
header value as a bag of octets and let the application figure it out from
there.

The one mechanism we should key off of, of course, is the Version header.
If we specify version 1.1, then the assumption is that the rather undefined
semantics of 1.1 are at play with regards to header values. Binary header
values would be disallowed for anything other than the core headers (host,
version, method, etc) and it would largely be up to the application to
figure things out. If we specify version 2.0, then the assumption is that
binary header values are allowed and all character-based header values are
UTF-8. This, at least, allows us to be no worse off than we are currently
with 1.1 but still allow improvement in 2.0.

It would just need to be understood that a down-level conversion from 2.0
to 1.1 (e.g. a 2.0-enabled client making a request against a 1.1-enabled
server) would be a necessarily lossy conversion.

- James


> Robert Brewer
> fumanchu@aminus.org
>
>
>

Received on Tuesday, 17 July 2012 16:18:57 UTC