Re: Delta Compression and UTF-8 Header Values

On Sat, 09 Feb 2013 00:53:10 +0100, Mark Nottingham <mnot@mnot.net> wrote:

> My .02 -
>
> RFC2616 implies that the range of characters available in headers is  
> ISO-8859-1 (while tilting the table heavily towards ASCII), and we've  
> clarified that in bis to recommend ASCII, while telling implementations  
> to handle anything else as opaque bytes.
>
> However, on the wire in HTTP/1, some bits are sent as UTF-8 (in  
> particular, the request-URI, from one or two browsers).
>

I don't see a reason to not UTF-8 encode all text fields. HTTP/1 forced a  
lot of heuristic code that tried to figure out how things where  
transformed on the way, and heuristics for decoders are bad. Though, as  
the world has moved to UTF-8, saying "opaque bytes" means UTF-8 in  
practice for everyone anyway. The problem are fields that ideally should  
be binary, say a hash for ETag. UTF-8 encoding would add 50% size there.

Creating a static huffman code for the ASCII part of Unicode shouldn't be  
a problem, as long as there is a prefix for non-ascii bytes.

/Martin Nilsson

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Received on Saturday, 9 February 2013 14:12:54 UTC