Re: FYI... Binary Optimized Header Encoding for SPDY from James M Snell on 2012-08-03 (ietf-http-wg@w3.org from July to September 2012)

From: James M Snell <jasnell@gmail.com>
Date: Fri, 3 Aug 2012 10:33:11 -0700
To: Mike Belshe <mike@belshe.com>
Cc: Jonathan Ballard <dzonatas@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7RbeVLVOM5HqZgjf=or-886_orr8ra1kaZHAaChjYTeFNvQ@mail.gmail.com>
On Fri, Aug 3, 2012 at 10:15 AM, Mike Belshe <mike@belshe.com> wrote:

> OK.  More concrete use cases for why we need this would help me understand
> better.
>
> From the User Agent's perspective - how would it know whether a given
> HTTP/2 server would be able to grok its UTF8 headers?  Just try and fail?
>
>
No different than what the user-agent does today: attempt to send the
message and be prepared to deal with failure. Currently, there's no
guarantee that an origin server supports the various Allow-* headers...
some do, some don't. That's ok.


> If you send a UTF8 header, and the origin server can't handle it, does it
> cause a failure of the entire request?  Or is this a partial failure?  I
> see you classified some as ignorable and others not.  Is the error code
> specifically a "couldn't pass UTF8 header" error?  If not, how would the UA
> differentiate an error that the server didn't grok the UTF8 vs other server
> errors?
>
>
The "must-understand" and "must-ignore" aspects of the codepage mechanism
have more to do with translation of headers that use numeric identifiers
into http/1.1 named headers than it does the translation of UTF-8 values.
However, more to the point.. a 400 Bad Request response would be fine for
the most part but a more specific error status could be provided. We
already have response codes like 431 Request Header Too Large, it would not
be unreasonable to have a similar 4xx response that indicates that the
message could not be translated successfully to a downlevel protocol.


> These are the new edge cases I was referring to earlier.
>
> Mike
>
>
> On Fri, Aug 3, 2012 at 8:52 AM, James M Snell <jasnell@gmail.com> wrote:
>
>> I'll say it again: simply allowing header values to contain UTF-8
>> characters does not break compatibility with 1.1 because the existing
>> header definitions for the existing headers would not be changed. The
>> change would impact new header definitions or applications that are
>> specifically targeted for 2.0 implementations.
>>
>> For example, suppose we define a new binary optimized encoding for the
>> host header that accepts IDN's. That new encoding does not change the
>> details of the existing Host header in HTTP/1.1. When an intermediary
>> translates the 2.0 message into a 1.1 message, it would convert the IDN
>> into a proper punycode value within the 1.1 Host header. If the
>> intermediary happens across some other arbitrary new header using UTF-8
>> that it does not understand or know how to translate, it can either ignore
>> it or return a protocol error. Interoperability is not affected at all.
>>
>> Limits are a bad thing... aren't they? ;-)
>>
>> - James
>>
>> On Fri, Aug 3, 2012 at 8:33 AM, Mike Belshe <mike@belshe.com> wrote:
>>
>>> One of the charter requirements of HTTP/2, I thought, was interop to
>>> HTTP/1.1 servers.
>>>
>>> If so, how would we pass UTF8 headers to HTTP/1.1 servers?
>>>
>>> If we can't then we don't need to support them, right?
>>>
>>> Mike
>>>
>>>
>>> On Fri, Aug 3, 2012 at 8:30 AM, Jonathan Ballard <dzonatas@gmail.com>wrote:
>>>
>>>> ASCII is not "trivially UTF8." UTF8 lacks the available flow control of
>>>> ASCII. Any conversion between ASCII and EBCDIC is best done in hardware. We
>>>> already know the security issue of conversions from unicode to EBCDIC, and
>>>> I doubt that is something we can scheme here on on-topic.
>>>>
>>>>
>>>> On Friday, August 3, 2012, "Martin J. Dürst" wrote:
>>>>
>>>>> On 2012/08/02 17:27, Poul-Henning Kamp wrote:
>>>>>
>>>>>> In message<
>>>>>> CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com>
>>>>>> , Mike Belshe writes:
>>>>>>
>>>>>>  * I don't think we need utf-8 encoded headers.  Not sure how you'd
>>>>>>>>> pass
>>>>>>>>>
>>>>>>>> them off to HTTP anyway?
>>>>>>>>
>>>>>>>
>>>>>>> I just don't see any problem being solved by adding this?  If there
>>>>>>> is no
>>>>>>> benefit, we should not do it, right?
>>>>>>>
>>>>>>
>>>>>> If this would solve any major problems inside a 20 year horizon, we
>>>>>> should do it.
>>>>>>
>>>>>
>>>>> It will solve quite a few problems, some of them major, maybe not for
>>>>> HTTP itself, but for the applications on top. It will actually solve some
>>>>> problems that have been around for at least the last 15 years.
>>>>>
>>>>> HTML and HTTP were created when the breakthrough of iso-8859-1
>>>>> (Latin-1) in Western Europe was predictable (the nascent Web helped to
>>>>> unify the Western Europe 'national' 7-bit and 8-bit encodings quite a bit).
>>>>>
>>>>> At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it
>>>>> was clear to those concerned that Unicode and UTF-8 was the way of the
>>>>> future. As everybody should be able to confirm when thinking about
>>>>> US-ASCII, using a single character encoding (rather than e.g. ASCII and
>>>>> EBCDIC or some such alternatively) brings HUGE benefits. The same is true
>>>>> when streamlining from a zoo of character encodings to UTF-8.
>>>>>
>>>>> These days, over 60% of the Web is already in UTF-8, and if you add in
>>>>> the 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other
>>>>> encodings are in serious decline. (see p. 52 of the July IEEE Spectrum).
>>>>> And efforts such as HTML5 are strongly pushing to get more UTF-8. I think
>>>>> lots of HTTP users would appreciate a better commitment from HTTP with
>>>>> respect to character encoding in headers and the like. What's currently
>>>>> there is really just a mess, and should be cleaned up.
>>>>>
>>>>>
>>>>> Regards,    Martin.
>>>>>
>>>>>
>>>
>>
>
Received on Friday, 3 August 2012 17:34:00 UTC