Re: character encoding in header fields, was: SPDY Header Frames

On 17/07/2012 8:08 p.m., Julian Reschke wrote:
> On 2012-07-17 09:59, Poul-Henning Kamp wrote:
>> In message <50051A91.1010401@gmx.de>, Julian Reschke writes:
>>> On 2012-07-17 09:44, "Martin J. Dürst" wrote:
>>>> ...
>>>> But a much, much better solution in this day and age is to only allow
>>>> one encoding, UTF-8. That by definition includes US-ASCII, covers all
>>>> the world's characters, and is what HTML is moving towards (with quite
>>>> surprising speed these days). And while in HTML (and other content
>>>> formats), non-ASCII is extremely widespread, in HTTP, it is not, and
>>>> having more than one encoding is needlessly complicated.
>>>> ...
>>>
>>> *If* we make a breaking change with respect to character encoding
>>> schemes, this is indeed the change to make.
>>
>> Indeed, and a change I think HTTP/2.0 should make, in light of a
>> 20 year design lifetime.
>
> As far as I can tell, the only thing that makes this hard is the 
> desire to be able to tunnel arbitrary HTTP/1.1 through HTTP/2.0.

There is hope though. Clients will be forced to begin new connections 
with HTTP/1.1 syntax Upgrade request (hopefully with zero-cost as in 
network-friendly) for the next 5-15 years or so until HTTP/1 dies out.

This means there is a server response along with path details known to 
both ends of the connection before any HTTP/2 non-ASCII character 
encoding can come into effect. Via header is already mandatory, and from 
this datum we can gather the path software versions (good luck to those 
who hide their HTTP compliance level). Which tells both server and 
client whether they can assume safe use of non-ASCII characters or if 
they have to mangle things down to the HTTP/1 compatible headers.

This is only relevnt to endpoints which want/need to use non-ASCII. We 
have two aternatives:
  1) mandate receivers accept UTF-8 HTTP/2, mandate senders generate 
ASCII in the 2.0 spec for later opening up in 2.1 or so.
  2) advise senders to always use ASCII characters within UTF-8 possible 
range, but not forbid non-ASCII.

Either way mandating receivers to accept UTF-8 is a long overdue MUST in 
my books for HTTP/2.

AYJ

Received on Tuesday, 17 July 2012 10:20:29 UTC