Re: UTF-8 or ASCII Header Names? from Martin Thomson on 2013-08-16 (ietf-http-wg@w3.org from July to September 2013)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Fri, 16 Aug 2013 09:19:00 -0700
To: Roberto Peon <grmocg@gmail.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Fred Akalin <akalin@google.com>, James Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABkgnnXzV9uguWVikCY_pQCy9ziSFs+_1Xid2kna+Rahs4HFqg@mail.gmail.com>

On 16 August 2013 08:44, Roberto Peon <grmocg@gmail.com> wrote:
> The keys should be ASCII, and the values bytes.

That's a fairly narrow view.  If the values were (for example) ASCII,
then you'd have an opportunity to compress better.  At worst, you can
wipe the high order bit from every octet.

At some level you are going to need to either make assumptions about
the properties of values, or rely on specific knowledge about them if
you are going to compress effectively.  Even if it were the case that
the bytes were UTF-8, you could still make some gains over pure bytes
(even just by exploiting the fact that certain byte sequences are not
possible in UTF-8).

Received on Friday, 16 August 2013 16:19:27 UTC