- From: Zhong Yu <zhong.j.yu@gmail.com>
- Date: Thu, 16 Jan 2014 05:18:45 -0600
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <OSAMAM@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <Michael.Bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
On Thu, Jan 16, 2014 at 5:00 AM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > * Gabriel Montenegro wrote: >>Some of us (cc line) have been discussing the unfortunate lack of >>determinism with respect to URI encoding in HTTP/1.1 and would like >>HTTP/2.0 to improve upon the situation. > > The practise of encoding character data in `http:` addresses using > anything other than UTF-8 is dying out fast and it is rather unclear UTF-8 is not very good for CJK charsets. It may not be a big deal in general, however, URLs are often displayed verbatim on user interfaces, the length matters. > what practical benefit there is in discriminating between addresses > that use only character data and all character data is UTF-8-encoded > and addresses that include non-character data or use some legacy en- > coding. > > Note that it is perfectly normal to run a service like > > http://example.org/transcode?from=iso-8859-1&to=utf-8&bytes=%C3%B6 > > Also note that a client cannot possibly know `%C3%B6` can be inter- > preted as UTF-8 bytes without the server telling it as much. This does > not change when it's instead > > http://example.org/transcode/from/iso-8859-1/to/utf-8/bytes/%C3%B6 > > Further note that some clients, for display purposes, treat at least > one of the two examples as though the `%C3%B6` were UTF-8. > >>In either case, the value to denote the charset would be a 32-bit >>integer equivalent to the "MIBenum" value in the IANA registry >>(http://www.iana.org/assignments/character-sets/character-sets.xhtml). >>Hence, the value would be 106 for UTF-8. The legacy behavior of >>non-determinism is indicated via the value 0. Notice that this is a >>reserved value for MIBenum. > > Allowing arbitrary encodings needs an exceedingly good reason. > -- > Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de > Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de > 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ >
Received on Thursday, 16 January 2014 11:19:12 UTC