W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2014

Re: UTF-8 in URIs

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 16 Jan 2014 12:00:03 +0100
To: Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <OSAMAM@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <Michael.Bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
Message-ID: <88efd99ehet4j0odubrl9jovcepdplgdca@hive.bjoern.hoehrmann.de>
* Gabriel Montenegro wrote:
>Some of us (cc line) have been discussing the unfortunate lack of 
>determinism with respect to URI encoding in HTTP/1.1 and would like 
>HTTP/2.0 to improve upon the situation.

The practise of encoding character data in `http:` addresses using
anything other than UTF-8 is dying out fast and it is rather unclear
what practical benefit there is in discriminating between addresses
that use only character data and all character data is UTF-8-encoded
and addresses that include non-character data or use some legacy en-

Note that it is perfectly normal to run a service like


Also note that a client cannot possibly know `%C3%B6` can be inter-
preted as UTF-8 bytes without the server telling it as much. This does
not change when it's instead


Further note that some clients, for display purposes, treat at least
one of the two examples as though the `%C3%B6` were UTF-8.

>In either case, the value to denote the charset would be a 32-bit 
>integer equivalent to the "MIBenum" value in the IANA registry 
>Hence, the value would be 106 for UTF-8. The legacy behavior of 
>non-determinism is indicated via the value 0. Notice that this is a 
>reserved value for MIBenum.

Allowing arbitrary encodings needs an exceedingly good reason.
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Thursday, 16 January 2014 11:00:47 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:14:23 UTC