Re: Delta Compression and UTF-8 Header Values

My .02 - 

RFC2616 implies that the range of characters available in headers is ISO-8859-1 (while tilting the table heavily towards ASCII), and we've clarified that in bis to recommend ASCII, while telling implementations to handle anything else as opaque bytes.

However, on the wire in HTTP/1, some bits are sent as UTF-8 (in particular, the request-URI, from one or two browsers).

I think our choices are roughly:

1) everything is opaque bytes
2) default to ASCII, flag headers using non-ASCII bytes to preserve them
3) everything is ASCII, require implementations that receive non-ASCII HTTP/1.1 to translate to ASCII (e.g., convert IRIs to URIs)

#1 is safest, but you don't get the benefit of re-encoding. The plan the the first implementation draft is to not try to take advantage of encoding, so it's the way we're likely to go -- for now.

#2 starts to walk down the encoding path. There are many variants; we could default to blobs, default to UTF-8, etc. We could just flag "ASCII or blob" or we could define many, many possible encodings, as discussed.

#3 seems risky to me.

Cheers, 


On 09/02/2013, at 6:28 AM, James M Snell <jasnell@gmail.com> wrote:

> Just going through more implementation details of the proposed delta
> encoding... one of the items that had come up previously in early
> http/2 discussions was the possibility of allowing for UTF-8 header
> values. Doing so would allow us to move away from things like
> punycode, pct-encoding, Q and B-Codecs, RFC 5987 mechanisms, etc it
> would bring along a range of other issues we would need to deal with.
> 
> One key challenge with allowing UTF-8 values, however, is that it
> conflicts with the use of the static huffman encoding in the proposed
> Delta Encoding for header compression. If we allow for non-ascii
> characters, the static huffman coding simply becomes too inefficient
> and unmanageable to be useful. There are a few ways around it but none
> of the strategies are all that attractive.
> 
> So the question is: do we want to allow UTF-8 header values? Is it
> worth the trade-off in less-efficient header compression? Or put
> another way, is increased compression efficiency worth ruling out
> UTF-8 header values?
> 
> (Obviously there are other issues with UTF-8 values we'd need to
> consider, such as http/1 interop)
> 
> - James
> 

--
Mark Nottingham   http://www.mnot.net/

Received on Friday, 8 February 2013 23:53:39 UTC