W3C home > Mailing lists > Public > public-iri@w3.org > July 2011

Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 27 Jul 2011 04:00:50 +0200
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <i6qu2799bg66marm4hsbarftoh59kh0t10@hive.bjoern.hoehrmann.de>
* Martin J. Dürst wrote:
>The idea is that because %-encoding in URIs has to be interpreted as 
>UTF-8 when converting to IRIs [...]

Converting `data:image/png,...%C3%B6...` to `data:image/png,...ö...`
is semantically wrong, there is no character "ö" in this, it's just
bytes. Sure, if you use UTF-8 and don't unicode-normalize, you can
round-trip in this manner, but that doesn't make it any more right.
If you have `http://.../%C3%B6` the situation is no different, there
is no reason for `%C3%B6` to actually mean `ö` in any sense beyond
round-tripping, "converting to IRIs" may be wrong in some situations.

I do understand what outcome you desire, but I do not understand how
you would get around this problem short of one or more of, accepting
wrong results like in the data: case above, relying on complicated
and probably unreliable heuristics, or abandoning the idea that some
of the time %xx sequences stand for octets while at other times they
stand for characters (turned into bytes by some character encoding).

I argued for the last option eight years ago, unsuccessfully, and I
do not like the first option. Do you think about this in terms of
the heuristics option and are saying the heuristics are not perfect,
or is there some other dimension to it? In your example you discuss
this only in terms of round-tripping, but that is not how I look at
this at all -- I want to get away from talking about bytes here.
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Wednesday, 27 July 2011 02:01:19 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:42 UTC