Re: *** GMX Spamverdacht *** Re: UTF-8 in URIs from Amos Jeffries on 2014-01-16 (ietf-http-wg@w3.org from January to March 2014)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Fri, 17 Jan 2014 09:52:06 +1300
To: ietf-http-wg@w3.org
Message-ID: <16cc05fad38bf2ddb310c1af40c64cda@treenet.co.nz>

On 2014-01-17 07:59, Martin Thomson wrote:
> On 16 January 2014 10:48, Gabriel Montenegro wrote:
>> However, iff an HTTP/2.0 client knows for sure the encoding (e.g., 
>> UTF-8), per the proposal it could indicate it so at the receiving side 
>> there are no guessing games: in the presence of such an explicit 
>> indication, either it is valid UTF-8, or it is an error, no further 
>> processing is done.
> 
> What you are proposing, perhaps, is that HTTP/2.0 support the carriage
> of IRIs.  Noting that this is effectively what browsers and HTML
> already do, perhaps that's not a terrible thing.
> 
> Would this apply to just :path, or would you extend this to :authority
> and allow IDNs there?  That does mean that the proxy/server is
> potentially exposed to unicode normalization rules and so forth.

A little nastiness exists for caches like PHK indicated. But this is no 
more trouble than Vary and can be handled in the same ways if necessary. 
It is optional and just tradeoff between CPU and memory in the end.

A slightly bigger issue is what mapping 2.0->1.1 gateways do with 
random-encoded URI.

As mentioned above the URI representation in HTTP/2 is already being 
split into components, so it does not exactly fit with the RFC 2616 
textual format to begin with. It seems simple enough to reference the 
new :foo components as containing RFC 3986 compliant UTF-8 pieces, to be 
reassembled or decoded by the implementations as needed.

I wonder if the whole situation would not be improved by making HTTP/2 
transport UTF-8 on a SHOULD requirement on emitting, with a note that 
senders may have limited themselves to the ASCII compatible range with 
%-encode of other bytes, and 2.0->1.1 gateways MUST %-encode the 
characters outside the ASCII range when delivering into HTTP/1 ?

Amos

Received on Thursday, 16 January 2014 20:52:33 UTC