Re: UTF-8 or ASCII Header Names? from Martin J. Dürst on 2013-08-17 (ietf-http-wg@w3.org from July to September 2013)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sat, 17 Aug 2013 22:17:50 +0900
To: Roberto Peon <grmocg@gmail.com>
CC: James M Snell <jasnell@gmail.com>, Martin Thomson <martin.thomson@gmail.com>, Fred Akalin <akalin@google.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <520F77FE.7030102@it.aoyama.ac.jp>

On 2013/08/17 1:57, Roberto Peon wrote:
> In addition to compressing the bytestrings, the compressor will have to
> validate utf-8. Nearly the same complexity as normalization (which was
> proposed earlier) to me-- I now get to scan things yet another time,

Sorry, but that's not true. You can get close to it being true for 
ASCII-only data, but that's about it. Checking for UTF-8 validity is a 
very small state machine (around 10 states) looking at one byte a time, 
and it can only succeed or fail. Normalization needs lots of data (a few 
10K bytes) for lookup, may need a buffer of indefinite length, may 
lengthen or shorten the data, and so on.

> increasing CPU utilization.. for what? Basically nothing in return if the
> upper-level doesn't care about it.
>
> If the upper-level cares about it, then it should be a prereq of feeding
> something into the compressor. If not, then it shouldn't be. Either way,
> these concerns belong outside the compressor.

I agree that this should be outside the compressor.

Regards,    Martin.

Received on Saturday, 17 August 2013 13:18:40 UTC