Re: FYI... Binary Optimized Header Encoding for SPDY from Mike Belshe on 2012-08-03 (ietf-http-wg@w3.org from July to September 2012)

From: Mike Belshe <mike@belshe.com>
Date: Fri, 3 Aug 2012 12:43:53 -0700
To: James M Snell <jasnell@gmail.com>
Cc: Jonathan Ballard <dzonatas@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABaLYCtu1Z5iV=MempARPtWsTGXondTWVrkzwd1DBKTGQbcsYg@mail.gmail.com>
On Fri, Aug 3, 2012 at 10:33 AM, James M Snell <jasnell@gmail.com> wrote:

>
>
> On Fri, Aug 3, 2012 at 10:15 AM, Mike Belshe <mike@belshe.com> wrote:
>
>> OK.  More concrete use cases for why we need this would help me
>> understand better.
>>
>> From the User Agent's perspective - how would it know whether a given
>> HTTP/2 server would be able to grok its UTF8 headers?  Just try and fail?
>>
>>
> No different than what the user-agent does today: attempt to send the
> message and be prepared to deal with failure. Currently, there's no
> guarantee that an origin server supports the various Allow-* headers...
> some do, some don't. That's ok.
>

OK - so this means browsers won't use it :-)

So now we need to know which use cases *would* use it :-)

Overall, its a fine idea, I just don't think we need it.

Mike




>
>
>> If you send a UTF8 header, and the origin server can't handle it, does it
>> cause a failure of the entire request?  Or is this a partial failure?  I
>> see you classified some as ignorable and others not.  Is the error code
>> specifically a "couldn't pass UTF8 header" error?  If not, how would the UA
>> differentiate an error that the server didn't grok the UTF8 vs other server
>> errors?
>>
>>
> The "must-understand" and "must-ignore" aspects of the codepage mechanism
> have more to do with translation of headers that use numeric identifiers
> into http/1.1 named headers than it does the translation of UTF-8 values.
> However, more to the point.. a 400 Bad Request response would be fine for
> the most part but a more specific error status could be provided. We
> already have response codes like 431 Request Header Too Large, it would not
> be unreasonable to have a similar 4xx response that indicates that the
> message could not be translated successfully to a downlevel protocol.
>
>
>> These are the new edge cases I was referring to earlier.
>>
>> Mike
>>
>>
>> On Fri, Aug 3, 2012 at 8:52 AM, James M Snell <jasnell@gmail.com> wrote:
>>
>>> I'll say it again: simply allowing header values to contain UTF-8
>>> characters does not break compatibility with 1.1 because the existing
>>> header definitions for the existing headers would not be changed. The
>>> change would impact new header definitions or applications that are
>>> specifically targeted for 2.0 implementations.
>>>
>>> For example, suppose we define a new binary optimized encoding for the
>>> host header that accepts IDN's. That new encoding does not change the
>>> details of the existing Host header in HTTP/1.1. When an intermediary
>>> translates the 2.0 message into a 1.1 message, it would convert the IDN
>>> into a proper punycode value within the 1.1 Host header. If the
>>> intermediary happens across some other arbitrary new header using UTF-8
>>> that it does not understand or know how to translate, it can either ignore
>>> it or return a protocol error. Interoperability is not affected at all.
>>>
>>> Limits are a bad thing... aren't they? ;-)
>>>
>>> - James
>>>
>>> On Fri, Aug 3, 2012 at 8:33 AM, Mike Belshe <mike@belshe.com> wrote:
>>>
>>>> One of the charter requirements of HTTP/2, I thought, was interop to
>>>> HTTP/1.1 servers.
>>>>
>>>> If so, how would we pass UTF8 headers to HTTP/1.1 servers?
>>>>
>>>> If we can't then we don't need to support them, right?
>>>>
>>>> Mike
>>>>
>>>>
>>>> On Fri, Aug 3, 2012 at 8:30 AM, Jonathan Ballard <dzonatas@gmail.com>wrote:
>>>>
>>>>> ASCII is not "trivially UTF8." UTF8 lacks the available flow control
>>>>> of ASCII. Any conversion between ASCII and EBCDIC is best done in hardware.
>>>>> We already know the security issue of conversions from unicode to EBCDIC,
>>>>> and I doubt that is something we can scheme here on on-topic.
>>>>>
>>>>>
>>>>> On Friday, August 3, 2012, "Martin J. Dürst" wrote:
>>>>>
>>>>>> On 2012/08/02 17:27, Poul-Henning Kamp wrote:
>>>>>>
>>>>>>> In message<
>>>>>>> CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com>
>>>>>>> , Mike Belshe writes:
>>>>>>>
>>>>>>>  * I don't think we need utf-8 encoded headers.  Not sure how you'd
>>>>>>>>>> pass
>>>>>>>>>>
>>>>>>>>> them off to HTTP anyway?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I just don't see any problem being solved by adding this?  If there
>>>>>>>> is no
>>>>>>>> benefit, we should not do it, right?
>>>>>>>>
>>>>>>>
>>>>>>> If this would solve any major problems inside a 20 year horizon, we
>>>>>>> should do it.
>>>>>>>
>>>>>>
>>>>>> It will solve quite a few problems, some of them major, maybe not for
>>>>>> HTTP itself, but for the applications on top. It will actually solve some
>>>>>> problems that have been around for at least the last 15 years.
>>>>>>
>>>>>> HTML and HTTP were created when the breakthrough of iso-8859-1
>>>>>> (Latin-1) in Western Europe was predictable (the nascent Web helped to
>>>>>> unify the Western Europe 'national' 7-bit and 8-bit encodings quite a bit).
>>>>>>
>>>>>> At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it
>>>>>> was clear to those concerned that Unicode and UTF-8 was the way of the
>>>>>> future. As everybody should be able to confirm when thinking about
>>>>>> US-ASCII, using a single character encoding (rather than e.g. ASCII and
>>>>>> EBCDIC or some such alternatively) brings HUGE benefits. The same is true
>>>>>> when streamlining from a zoo of character encodings to UTF-8.
>>>>>>
>>>>>> These days, over 60% of the Web is already in UTF-8, and if you add
>>>>>> in the 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other
>>>>>> encodings are in serious decline. (see p. 52 of the July IEEE Spectrum).
>>>>>> And efforts such as HTML5 are strongly pushing to get more UTF-8. I think
>>>>>> lots of HTTP users would appreciate a better commitment from HTTP with
>>>>>> respect to character encoding in headers and the like. What's currently
>>>>>> there is really just a mess, and should be cleaned up.
>>>>>>
>>>>>>
>>>>>> Regards,    Martin.
>>>>>>
>>>>>>
>>>>
>>>
>>
>
Received on Friday, 3 August 2012 19:44:22 UTC