Re: UTF-8 in URIs

On 2014-01-16 19:48, Gabriel Montenegro wrote:
> To clarify: The proposal is *NOT* to impose a default of UTF-8 in URIs for HTTP/2.0. As some have mentioned, there are too many legacy issues.

Yes.

> The issue is that since http and https are legacy scheme's from the point of view of rfc3986, they don't have a fixed encoding. If somebody were to define a new scheme, say, "http2" that would benefit from rfc3986 rules so the encoding would be known: UTF-8 with percent encoding.
>
> Unfortunately, for http and http2 URI handling at either the proxy (to check for a cache hit) or at the origin server is non-deterministic. Several encodings are tried until one works. Such non-determinism is also a potential security issue, as a URI could decode in more than one way as several encoding are tried.

A proxy does not need to normalize. Full stop. There is no issue here, IMHO.

The URIs are minted by the origin server, and IMHO it's not a problem if 
a client that decodes and re-encodes using a different encoding gets 
punished by a cache miss.

An origin server doesn't need to try different encoding. It's sufficient 
if it uses the encoding it used when generating the URI. And if that 
encoding happens to be UTF-8, it will also benefit from IRIs typed into 
a browser bar.

> However, iff an HTTP/2.0 client knows for sure the encoding (e.g., UTF-8), per the proposal it could indicate it so at the receiving side there are no guessing games: in the presence of such an explicit indication, either it is valid UTF-8, or it is an error, no further processing is done.

Again; nobody *needs* to guess right now. So I'm still confused about 
what problem you're trying to solve.

And even *if* this helps this still leaves the related issue of NFC vs 
NFD unsolved.

Best regards, Julian

Received on Thursday, 16 January 2014 21:33:02 UTC