Re: FYI... Binary Optimized Header Encoding for SPDY from James M Snell on 2012-08-03 (ietf-http-wg@w3.org from July to September 2012)

From: James M Snell <jasnell@gmail.com>
Date: Fri, 3 Aug 2012 08:52:30 -0700
To: Mike Belshe <mike@belshe.com>
Cc: Jonathan Ballard <dzonatas@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7RbezEoeT2xiUjXMh5Bat-qKX1xB8ELQRHa7ZqX_5MrujGg@mail.gmail.com>

I'll say it again: simply allowing header values to contain UTF-8
characters does not break compatibility with 1.1 because the existing
header definitions for the existing headers would not be changed. The
change would impact new header definitions or applications that are
specifically targeted for 2.0 implementations.

For example, suppose we define a new binary optimized encoding for the host
header that accepts IDN's. That new encoding does not change the details of
the existing Host header in HTTP/1.1. When an intermediary translates the
2.0 message into a 1.1 message, it would convert the IDN into a proper
punycode value within the 1.1 Host header. If the intermediary happens
across some other arbitrary new header using UTF-8 that it does not
understand or know how to translate, it can either ignore it or return a
protocol error. Interoperability is not affected at all.

Limits are a bad thing... aren't they? ;-)

- James

On Fri, Aug 3, 2012 at 8:33 AM, Mike Belshe <mike@belshe.com> wrote:

> One of the charter requirements of HTTP/2, I thought, was interop to
> HTTP/1.1 servers.
>
> If so, how would we pass UTF8 headers to HTTP/1.1 servers?
>
> If we can't then we don't need to support them, right?
>
> Mike
>
>
> On Fri, Aug 3, 2012 at 8:30 AM, Jonathan Ballard <dzonatas@gmail.com>wrote:
>
>> ASCII is not "trivially UTF8." UTF8 lacks the available flow control of
>> ASCII. Any conversion between ASCII and EBCDIC is best done in hardware. We
>> already know the security issue of conversions from unicode to EBCDIC, and
>> I doubt that is something we can scheme here on on-topic.
>>
>>
>> On Friday, August 3, 2012, "Martin J. Dürst" wrote:
>>
>>> On 2012/08/02 17:27, Poul-Henning Kamp wrote:
>>>
>>>> In message<
>>>> CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com>
>>>> , Mike Belshe writes:
>>>>
>>>>  * I don't think we need utf-8 encoded headers.  Not sure how you'd pass
>>>>>>>
>>>>>> them off to HTTP anyway?
>>>>>>
>>>>>
>>>>> I just don't see any problem being solved by adding this?  If there is
>>>>> no
>>>>> benefit, we should not do it, right?
>>>>>
>>>>
>>>> If this would solve any major problems inside a 20 year horizon, we
>>>> should do it.
>>>>
>>>
>>> It will solve quite a few problems, some of them major, maybe not for
>>> HTTP itself, but for the applications on top. It will actually solve some
>>> problems that have been around for at least the last 15 years.
>>>
>>> HTML and HTTP were created when the breakthrough of iso-8859-1 (Latin-1)
>>> in Western Europe was predictable (the nascent Web helped to unify the
>>> Western Europe 'national' 7-bit and 8-bit encodings quite a bit).
>>>
>>> At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it
>>> was clear to those concerned that Unicode and UTF-8 was the way of the
>>> future. As everybody should be able to confirm when thinking about
>>> US-ASCII, using a single character encoding (rather than e.g. ASCII and
>>> EBCDIC or some such alternatively) brings HUGE benefits. The same is true
>>> when streamlining from a zoo of character encodings to UTF-8.
>>>
>>> These days, over 60% of the Web is already in UTF-8, and if you add in
>>> the 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other
>>> encodings are in serious decline. (see p. 52 of the July IEEE Spectrum).
>>> And efforts such as HTML5 are strongly pushing to get more UTF-8. I think
>>> lots of HTTP users would appreciate a better commitment from HTTP with
>>> respect to character encoding in headers and the like. What's currently
>>> there is really just a mess, and should be cleaned up.
>>>
>>>
>>> Regards,    Martin.
>>>
>>>
>

Received on Friday, 3 August 2012 15:53:19 UTC