Re: UTF-8 or ASCII Header Names?

On 2013/08/17 1:57, Roberto Peon wrote:
> In addition to compressing the bytestrings, the compressor will have to
> validate utf-8. Nearly the same complexity as normalization (which was
> proposed earlier) to me-- I now get to scan things yet another time,

Sorry, but that's not true. You can get close to it being true for 
ASCII-only data, but that's about it. Checking for UTF-8 validity is a 
very small state machine (around 10 states) looking at one byte a time, 
and it can only succeed or fail. Normalization needs lots of data (a few 
10K bytes) for lookup, may need a buffer of indefinite length, may 
lengthen or shorten the data, and so on.

> increasing CPU utilization.. for what? Basically nothing in return if the
> upper-level doesn't care about it.
>
> If the upper-level cares about it, then it should be a prereq of feeding
> something into the compressor. If not, then it shouldn't be. Either way,
> these concerns belong outside the compressor.

I agree that this should be outside the compressor.

Regards,    Martin.

Received on Saturday, 17 August 2013 13:18:40 UTC