Re: FYI... Binary Optimized Header Encoding for SPDY from Roberto Peon on 2012-08-03 (ietf-http-wg@w3.org from July to September 2012)

From: Roberto Peon <grmocg@gmail.com>
Date: Fri, 3 Aug 2012 10:33:07 -0700
To: James M Snell <jasnell@gmail.com>
Cc: Mike Belshe <mike@belshe.com>, Jonathan Ballard <dzonatas@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNf4Qr4J=ZsGhfTathCDQLKZvujU=pFaFQYYDgk8dm++jQ@mail.gmail.com>

I'm biased against utf-8, because it is trivial at the application layer to
write a function which encodes and/or decodes it.
I see that handling utf-8 adds complexity to the protocol but buys the
protocol nothing. It adds minimal advantage for the entities using the
protocol, and makes intermediaries lives more difficult since they'll have
to do more verification.

Saying that the protocol handles sending a length-delimited string or a
string guaranteed not to include '\n' would be fine, however, as at that
point whatever encoding is used in any particular header value is matter of
the client-app and server, as it should be for things that the protocol
doesn't need to know about.

I'd need a bunch of examples of where this is essential or useful before
I'd be happy supporting it.
-=R

On Fri, Aug 3, 2012 at 8:52 AM, James M Snell <jasnell@gmail.com> wrote:

> I'll say it again: simply allowing header values to contain UTF-8
> characters does not break compatibility with 1.1 because the existing
> header definitions for the existing headers would not be changed. The
> change would impact new header definitions or applications that are
> specifically targeted for 2.0 implementations.
>
> For example, suppose we define a new binary optimized encoding for the
> host header that accepts IDN's. That new encoding does not change the
> details of the existing Host header in HTTP/1.1. When an intermediary
> translates the 2.0 message into a 1.1 message, it would convert the IDN
> into a proper punycode value within the 1.1 Host header. If the
> intermediary happens across some other arbitrary new header using UTF-8
> that it does not understand or know how to translate, it can either ignore
> it or return a protocol error. Interoperability is not affected at all.
>
> Limits are a bad thing... aren't they? ;-)
>
> - James
>
> On Fri, Aug 3, 2012 at 8:33 AM, Mike Belshe <mike@belshe.com> wrote:
>
>> One of the charter requirements of HTTP/2, I thought, was interop to
>> HTTP/1.1 servers.
>>
>> If so, how would we pass UTF8 headers to HTTP/1.1 servers?
>>
>> If we can't then we don't need to support them, right?
>>
>> Mike
>>
>>
>> On Fri, Aug 3, 2012 at 8:30 AM, Jonathan Ballard <dzonatas@gmail.com>wrote:
>>
>>> ASCII is not "trivially UTF8." UTF8 lacks the available flow control of
>>> ASCII. Any conversion between ASCII and EBCDIC is best done in hardware. We
>>> already know the security issue of conversions from unicode to EBCDIC, and
>>> I doubt that is something we can scheme here on on-topic.
>>>
>>>
>>> On Friday, August 3, 2012, "Martin J. Dürst" wrote:
>>>
>>>> On 2012/08/02 17:27, Poul-Henning Kamp wrote:
>>>>
>>>>> In message<
>>>>> CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com>
>>>>> , Mike Belshe writes:
>>>>>
>>>>>  * I don't think we need utf-8 encoded headers.  Not sure how you'd
>>>>>>>> pass
>>>>>>>>
>>>>>>> them off to HTTP anyway?
>>>>>>>
>>>>>>
>>>>>> I just don't see any problem being solved by adding this?  If there
>>>>>> is no
>>>>>> benefit, we should not do it, right?
>>>>>>
>>>>>
>>>>> If this would solve any major problems inside a 20 year horizon, we
>>>>> should do it.
>>>>>
>>>>
>>>> It will solve quite a few problems, some of them major, maybe not for
>>>> HTTP itself, but for the applications on top. It will actually solve some
>>>> problems that have been around for at least the last 15 years.
>>>>
>>>> HTML and HTTP were created when the breakthrough of iso-8859-1
>>>> (Latin-1) in Western Europe was predictable (the nascent Web helped to
>>>> unify the Western Europe 'national' 7-bit and 8-bit encodings quite a bit).
>>>>
>>>> At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it
>>>> was clear to those concerned that Unicode and UTF-8 was the way of the
>>>> future. As everybody should be able to confirm when thinking about
>>>> US-ASCII, using a single character encoding (rather than e.g. ASCII and
>>>> EBCDIC or some such alternatively) brings HUGE benefits. The same is true
>>>> when streamlining from a zoo of character encodings to UTF-8.
>>>>
>>>> These days, over 60% of the Web is already in UTF-8, and if you add in
>>>> the 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other
>>>> encodings are in serious decline. (see p. 52 of the July IEEE Spectrum).
>>>> And efforts such as HTML5 are strongly pushing to get more UTF-8. I think
>>>> lots of HTTP users would appreciate a better commitment from HTTP with
>>>> respect to character encoding in headers and the like. What's currently
>>>> there is really just a mess, and should be cleaned up.
>>>>
>>>>
>>>> Regards,    Martin.
>>>>
>>>>
>>
>

Received on Friday, 3 August 2012 17:33:35 UTC