Re: FYI... Binary Optimized Header Encoding for SPDY from Martin J. Dürst on 2012-08-03 (ietf-http-wg@w3.org from July to September 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Fri, 03 Aug 2012 19:27:14 +0900
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
CC: Mike Belshe <mike@belshe.com>, James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
Message-ID: <501BA782.9030708@it.aoyama.ac.jp>

On 2012/08/02 17:27, Poul-Henning Kamp wrote:
> In message<CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com>
> , Mike Belshe writes:
>
>>>> * I don't think we need utf-8 encoded headers.  Not sure how you'd pass
>>> them off to HTTP anyway?
>>
>> I just don't see any problem being solved by adding this?  If there is no
>> benefit, we should not do it, right?
>
> If this would solve any major problems inside a 20 year horizon, we
> should do it.

It will solve quite a few problems, some of them major, maybe not for 
HTTP itself, but for the applications on top. It will actually solve 
some problems that have been around for at least the last 15 years.

HTML and HTTP were created when the breakthrough of iso-8859-1 (Latin-1) 
in Western Europe was predictable (the nascent Web helped to unify the 
Western Europe 'national' 7-bit and 8-bit encodings quite a bit).

At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it 
was clear to those concerned that Unicode and UTF-8 was the way of the 
future. As everybody should be able to confirm when thinking about 
US-ASCII, using a single character encoding (rather than e.g. ASCII and 
EBCDIC or some such alternatively) brings HUGE benefits. The same is 
true when streamlining from a zoo of character encodings to UTF-8.

These days, over 60% of the Web is already in UTF-8, and if you add in 
the 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other 
encodings are in serious decline. (see p. 52 of the July IEEE Spectrum). 
And efforts such as HTML5 are strongly pushing to get more UTF-8. I 
think lots of HTTP users would appreciate a better commitment from HTTP 
with respect to character encoding in headers and the like. What's 
currently there is really just a mess, and should be cleaned up.

Regards,    Martin.

Received on Friday, 3 August 2012 14:08:05 UTC