Re: character encoding in header fields, was: SPDY Header Frames from James M Snell on 2012-07-17 (ietf-http-wg@w3.org from July to September 2012)

From: James M Snell <jasnell@gmail.com>
Date: Tue, 17 Jul 2012 07:48:13 -0700
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: ietf-http-wg@w3.org
Message-ID: <CABP7Rbdv706-jUunX4k+m87Zznc6TPMFo+m040g4ExTmTrpwMw@mail.gmail.com>

Tunneling 1.1 traffic via 2.0 would likely be the easy part; it's the
possibility of having to downgrade from 2.0 to 1.1 mid stream that would
make this difficult. Declaring that all header values must be UTF-8 would
essentially make it impossible to reliably bridge 2.0 traffic to 1.1
because whichever component is doing that bridging would have to know
exactly how to translate every header into it's appropriate us-ascii
equivalent. For some of the well-known "registered" headers, that's not a
major problem but it obviously falls down completely when we start talking
about extension headers.

Of course, we face the exact same problem if we allow for purely binary
header values (as I have also suggested).

The one thing we need to determine is: how critical is the ability to
support seamless down-level conversion from 2.0 to 1.1 within a request? Is
it acceptable for us to say that while 2.0 can be used to transport 1.1
messages, the reverse is not possible.

- James

On Tue, Jul 17, 2012 at 3:19 AM, Amos Jeffries <squid3@treenet.co.nz> wrote:

> On 17/07/2012 8:08 p.m., Julian Reschke wrote:
>
>> On 2012-07-17 09:59, Poul-Henning Kamp wrote:
>>
>>> In message <50051A91.1010401@gmx.de>, Julian Reschke writes:
>>>
>>>> On 2012-07-17 09:44, "Martin J. Dürst" wrote:
>>>>
>>>>> ...
>>>>> But a much, much better solution in this day and age is to only allow
>>>>> one encoding, UTF-8. That by definition includes US-ASCII, covers all
>>>>> the world's characters, and is what HTML is moving towards (with quite
>>>>> surprising speed these days). And while in HTML (and other content
>>>>> formats), non-ASCII is extremely widespread, in HTTP, it is not, and
>>>>> having more than one encoding is needlessly complicated.
>>>>> ...
>>>>>
>>>>
>>>> *If* we make a breaking change with respect to character encoding
>>>> schemes, this is indeed the change to make.
>>>>
>>>
>>> Indeed, and a change I think HTTP/2.0 should make, in light of a
>>> 20 year design lifetime.
>>>
>>
>> As far as I can tell, the only thing that makes this hard is the desire
>> to be able to tunnel arbitrary HTTP/1.1 through HTTP/2.0.
>>
>
> There is hope though. Clients will be forced to begin new connections with
> HTTP/1.1 syntax Upgrade request (hopefully with zero-cost as in
> network-friendly) for the next 5-15 years or so until HTTP/1 dies out.
>
> This means there is a server response along with path details known to
> both ends of the connection before any HTTP/2 non-ASCII character encoding
> can come into effect. Via header is already mandatory, and from this datum
> we can gather the path software versions (good luck to those who hide their
> HTTP compliance level). Which tells both server and client whether they can
> assume safe use of non-ASCII characters or if they have to mangle things
> down to the HTTP/1 compatible headers.
>
> This is only relevnt to endpoints which want/need to use non-ASCII. We
> have two aternatives:
>  1) mandate receivers accept UTF-8 HTTP/2, mandate senders generate ASCII
> in the 2.0 spec for later opening up in 2.1 or so.
>  2) advise senders to always use ASCII characters within UTF-8 possible
> range, but not forbid non-ASCII.
>
> Either way mandating receivers to accept UTF-8 is a long overdue MUST in
> my books for HTTP/2.
>
> AYJ
>
>

Received on Tuesday, 17 July 2012 14:49:09 UTC