Re: UTF-8 in URIs

* Gabriel Montenegro wrote:
>Some of us (cc line) have been discussing the unfortunate lack of 
>determinism with respect to URI encoding in HTTP/1.1 and would like 
>HTTP/2.0 to improve upon the situation.

The practise of encoding character data in `http:` addresses using
anything other than UTF-8 is dying out fast and it is rather unclear
what practical benefit there is in discriminating between addresses
that use only character data and all character data is UTF-8-encoded
and addresses that include non-character data or use some legacy en-
coding.

Note that it is perfectly normal to run a service like

  http://example.org/transcode?from=iso-8859-1&to=utf-8&bytes=%C3%B6

Also note that a client cannot possibly know `%C3%B6` can be inter-
preted as UTF-8 bytes without the server telling it as much. This does
not change when it's instead

  http://example.org/transcode/from/iso-8859-1/to/utf-8/bytes/%C3%B6

Further note that some clients, for display purposes, treat at least
one of the two examples as though the `%C3%B6` were UTF-8.

>In either case, the value to denote the charset would be a 32-bit 
>integer equivalent to the "MIBenum" value in the IANA registry 
>(http://www.iana.org/assignments/character-sets/character-sets.xhtml). 
>Hence, the value would be 106 for UTF-8. The legacy behavior of 
>non-determinism is indicated via the value 0. Notice that this is a 
>reserved value for MIBenum.

Allowing arbitrary encodings needs an exceedingly good reason.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Thursday, 16 January 2014 11:00:47 UTC