- From: Robert Brewer <fumanchu@aminus.org>
- Date: Tue, 17 Jul 2012 08:57:55 -0700
- To: "Julian Reschke" <julian.reschke@gmx.de>, "James M Snell" <jasnell@gmail.com>
- Cc: "Amos Jeffries" <squid3@treenet.co.nz>, <ietf-http-wg@w3.org>
Julian Reschke wrote: > On 2012-07-17 16:48, James M Snell wrote: > > Tunneling 1.1 traffic via 2.0 would likely be the easy part; it's the > > Not even that. Given an HTTP/1.1 message containing non-ASCII octets in > header field value, you simply don't know what unicode characters to > map > them to. > > This is not theoretical; some UAs process UTF-8 in Content-Disposition, > some use the installation's locale character set. > > Yes, this is a mess, but it's not clear to me how to break out of it > without breaking *some* setups that currently "work". > > > ... > > The one thing we need to determine is: how critical is the ability to > > support seamless down-level conversion from 2.0 to 1.1 within a > request? > > Is it acceptable for us to say that while 2.0 can be used to > transport > > 1.1 messages, the reverse is not possible. > > ... > > So how do you transport a 1.1 message inside 2.0 if it contains > non-ASCII? Treat the header field value as binary? Just to share a field note: The Python web community dealt with this exact problem recently with the advent of Python 3, which elevated Unicode quite a bit and exposed this problem more clearly to many. The chosen solution was to take the bytes-of-unknown-encoding and decode them as ISO-8859-1 (which at least won't error on any byte sequence), and leave that mess for a higher layer (which presumably would have more context) to re-encode/decode if they liked. Not a perfect solution but better than nothing. Robert Brewer fumanchu@aminus.org
Received on Tuesday, 17 July 2012 15:58:41 UTC