- From: James M Snell <jasnell@gmail.com>
- Date: Fri, 8 Feb 2013 17:10:02 -0800
- To: Mark Nottingham <mnot@mnot.net>
- Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Fri, Feb 8, 2013 at 3:53 PM, Mark Nottingham <mnot@mnot.net> wrote: > My .02 - > > RFC2616 implies that the range of characters available in headers is ISO-8859-1 (while tilting the table heavily towards ASCII), and we've clarified that in bis to recommend ASCII, while telling implementations to handle anything else as opaque bytes. > > However, on the wire in HTTP/1, some bits are sent as UTF-8 (in particular, the request-URI, from one or two browsers). > > I think our choices are roughly: > > 1) everything is opaque bytes > 2) default to ASCII, flag headers using non-ASCII bytes to preserve them > 3) everything is ASCII, require implementations that receive non-ASCII HTTP/1.1 to translate to ASCII (e.g., convert IRIs to URIs) > > #1 is safest, but you don't get the benefit of re-encoding. The plan the the first implementation draft is to not try to take advantage of encoding, so it's the way we're likely to go -- for now. > > #2 starts to walk down the encoding path. There are many variants; we could default to blobs, default to UTF-8, etc. We could just flag "ASCII or blob" or we could define many, many possible encodings, as discussed. > > #3 seems risky to me. > I have the distinct feeling we're going to end up somewhere between #1 and #2.. which means bad things for the static huffman-coding. If we end up with #2, we'll be able to huffman code anything that is flagged as ASCII, and won't be able to touch the rest. - James > Cheers, > > > On 09/02/2013, at 6:28 AM, James M Snell <jasnell@gmail.com> wrote: > >> Just going through more implementation details of the proposed delta >> encoding... one of the items that had come up previously in early >> http/2 discussions was the possibility of allowing for UTF-8 header >> values. Doing so would allow us to move away from things like >> punycode, pct-encoding, Q and B-Codecs, RFC 5987 mechanisms, etc it >> would bring along a range of other issues we would need to deal with. >> >> One key challenge with allowing UTF-8 values, however, is that it >> conflicts with the use of the static huffman encoding in the proposed >> Delta Encoding for header compression. If we allow for non-ascii >> characters, the static huffman coding simply becomes too inefficient >> and unmanageable to be useful. There are a few ways around it but none >> of the strategies are all that attractive. >> >> So the question is: do we want to allow UTF-8 header values? Is it >> worth the trade-off in less-efficient header compression? Or put >> another way, is increased compression efficiency worth ruling out >> UTF-8 header values? >> >> (Obviously there are other issues with UTF-8 values we'd need to >> consider, such as http/1 interop) >> >> - James >> > > -- > Mark Nottingham http://www.mnot.net/ > > >
Received on Saturday, 9 February 2013 01:10:56 UTC