- From: James M Snell <jasnell@gmail.com>
- Date: Fri, 3 Aug 2012 10:33:11 -0700
- To: Mike Belshe <mike@belshe.com>
- Cc: Jonathan Ballard <dzonatas@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
- Message-ID: <CABP7RbeVLVOM5HqZgjf=or-886_orr8ra1kaZHAaChjYTeFNvQ@mail.gmail.com>
On Fri, Aug 3, 2012 at 10:15 AM, Mike Belshe <mike@belshe.com> wrote: > OK. More concrete use cases for why we need this would help me understand > better. > > From the User Agent's perspective - how would it know whether a given > HTTP/2 server would be able to grok its UTF8 headers? Just try and fail? > > No different than what the user-agent does today: attempt to send the message and be prepared to deal with failure. Currently, there's no guarantee that an origin server supports the various Allow-* headers... some do, some don't. That's ok. > If you send a UTF8 header, and the origin server can't handle it, does it > cause a failure of the entire request? Or is this a partial failure? I > see you classified some as ignorable and others not. Is the error code > specifically a "couldn't pass UTF8 header" error? If not, how would the UA > differentiate an error that the server didn't grok the UTF8 vs other server > errors? > > The "must-understand" and "must-ignore" aspects of the codepage mechanism have more to do with translation of headers that use numeric identifiers into http/1.1 named headers than it does the translation of UTF-8 values. However, more to the point.. a 400 Bad Request response would be fine for the most part but a more specific error status could be provided. We already have response codes like 431 Request Header Too Large, it would not be unreasonable to have a similar 4xx response that indicates that the message could not be translated successfully to a downlevel protocol. > These are the new edge cases I was referring to earlier. > > Mike > > > On Fri, Aug 3, 2012 at 8:52 AM, James M Snell <jasnell@gmail.com> wrote: > >> I'll say it again: simply allowing header values to contain UTF-8 >> characters does not break compatibility with 1.1 because the existing >> header definitions for the existing headers would not be changed. The >> change would impact new header definitions or applications that are >> specifically targeted for 2.0 implementations. >> >> For example, suppose we define a new binary optimized encoding for the >> host header that accepts IDN's. That new encoding does not change the >> details of the existing Host header in HTTP/1.1. When an intermediary >> translates the 2.0 message into a 1.1 message, it would convert the IDN >> into a proper punycode value within the 1.1 Host header. If the >> intermediary happens across some other arbitrary new header using UTF-8 >> that it does not understand or know how to translate, it can either ignore >> it or return a protocol error. Interoperability is not affected at all. >> >> Limits are a bad thing... aren't they? ;-) >> >> - James >> >> On Fri, Aug 3, 2012 at 8:33 AM, Mike Belshe <mike@belshe.com> wrote: >> >>> One of the charter requirements of HTTP/2, I thought, was interop to >>> HTTP/1.1 servers. >>> >>> If so, how would we pass UTF8 headers to HTTP/1.1 servers? >>> >>> If we can't then we don't need to support them, right? >>> >>> Mike >>> >>> >>> On Fri, Aug 3, 2012 at 8:30 AM, Jonathan Ballard <dzonatas@gmail.com>wrote: >>> >>>> ASCII is not "trivially UTF8." UTF8 lacks the available flow control of >>>> ASCII. Any conversion between ASCII and EBCDIC is best done in hardware. We >>>> already know the security issue of conversions from unicode to EBCDIC, and >>>> I doubt that is something we can scheme here on on-topic. >>>> >>>> >>>> On Friday, August 3, 2012, "Martin J. Dürst" wrote: >>>> >>>>> On 2012/08/02 17:27, Poul-Henning Kamp wrote: >>>>> >>>>>> In message< >>>>>> CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com> >>>>>> , Mike Belshe writes: >>>>>> >>>>>> * I don't think we need utf-8 encoded headers. Not sure how you'd >>>>>>>>> pass >>>>>>>>> >>>>>>>> them off to HTTP anyway? >>>>>>>> >>>>>>> >>>>>>> I just don't see any problem being solved by adding this? If there >>>>>>> is no >>>>>>> benefit, we should not do it, right? >>>>>>> >>>>>> >>>>>> If this would solve any major problems inside a 20 year horizon, we >>>>>> should do it. >>>>>> >>>>> >>>>> It will solve quite a few problems, some of them major, maybe not for >>>>> HTTP itself, but for the applications on top. It will actually solve some >>>>> problems that have been around for at least the last 15 years. >>>>> >>>>> HTML and HTTP were created when the breakthrough of iso-8859-1 >>>>> (Latin-1) in Western Europe was predictable (the nascent Web helped to >>>>> unify the Western Europe 'national' 7-bit and 8-bit encodings quite a bit). >>>>> >>>>> At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it >>>>> was clear to those concerned that Unicode and UTF-8 was the way of the >>>>> future. As everybody should be able to confirm when thinking about >>>>> US-ASCII, using a single character encoding (rather than e.g. ASCII and >>>>> EBCDIC or some such alternatively) brings HUGE benefits. The same is true >>>>> when streamlining from a zoo of character encodings to UTF-8. >>>>> >>>>> These days, over 60% of the Web is already in UTF-8, and if you add in >>>>> the 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other >>>>> encodings are in serious decline. (see p. 52 of the July IEEE Spectrum). >>>>> And efforts such as HTML5 are strongly pushing to get more UTF-8. I think >>>>> lots of HTTP users would appreciate a better commitment from HTTP with >>>>> respect to character encoding in headers and the like. What's currently >>>>> there is really just a mess, and should be cleaned up. >>>>> >>>>> >>>>> Regards, Martin. >>>>> >>>>> >>> >> >
Received on Friday, 3 August 2012 17:34:00 UTC