Re: FYI... Binary Optimized Header Encoding for SPDY from Martin J. Dürst on 2012-08-03 (ietf-http-wg@w3.org from July to September 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Fri, 03 Aug 2012 19:27:59 +0900
To: Martin Nilsson <nilsson@opera.com>
CC: ietf-http-wg@w3.org
Message-ID: <501BA7AF.7010207@it.aoyama.ac.jp>

On 2012/08/03 8:25, Martin Nilsson wrote:
> On Thu, 02 Aug 2012 10:27:35 +0200, Poul-Henning Kamp
> <phk@phk.freebsd.dk> wrote:
>
>> That being said, I am not a big fan of UTF8 in high-performance
>> protocol context: It is much slower to process than 8bit string
>> formats.
>>
>
> I would like to know more on what operations you need. I imagine that
> most relevant operations (splitting, joining, comparing, strlen) can be
> performed directly on the encoded UTF8 string as efficient as on ASCII.
> Normalization and upper/lowercasing is trickier, but mostly because of
> all the Unicode rules, not UTF8 itself(though it doesn't help).

I very much agree. I'm also not sure what "8bit string" means. If it 
means binary data, then I agree that's the way to go for things like 
dates, but it doesn't help at all for text. If it's actual text, then it 
doesn't make sense because one needs more than 256 characters to write 
the world's languages.


>> UTF8 also gives rise to a number of interesting security aspects,
>> primarily where humans eyeball for security and don't detect minor
>> differences between glyphs, particularly in FQDNs, but I can't see
>> how we can do anything about that in HTTP/2.0.

Yes. Human eyballs can also easily be confused by changing between 0/O 
and 1/l/I and such, so that's not a new problem.


> Defining legal character ranges and what character encodings to use are
> two different problems. Similar looking characters are indeed a problem
> already today, and it is known and worked on on the browser side.

Yes indeed!

Regards,   Martin.

Received on Friday, 3 August 2012 14:08:27 UTC