- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 27 Jul 2011 04:00:50 +0200
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- Cc: "public-iri@w3.org" <public-iri@w3.org>
* Martin J. Dürst wrote: >The idea is that because %-encoding in URIs has to be interpreted as >UTF-8 when converting to IRIs [...] Converting `data:image/png,...%C3%B6...` to `data:image/png,...ö...` is semantically wrong, there is no character "ö" in this, it's just bytes. Sure, if you use UTF-8 and don't unicode-normalize, you can round-trip in this manner, but that doesn't make it any more right. If you have `http://.../%C3%B6` the situation is no different, there is no reason for `%C3%B6` to actually mean `ö` in any sense beyond round-tripping, "converting to IRIs" may be wrong in some situations. I do understand what outcome you desire, but I do not understand how you would get around this problem short of one or more of, accepting wrong results like in the data: case above, relying on complicated and probably unreliable heuristics, or abandoning the idea that some of the time %xx sequences stand for octets while at other times they stand for characters (turned into bytes by some character encoding). I argued for the last option eight years ago, unsuccessfully, and I do not like the first option. Do you think about this in terms of the heuristics option and are saying the heuristics are not perfect, or is there some other dimension to it? In your example you discuss this only in terms of round-tripping, but that is not how I look at this at all -- I want to get away from talking about bytes here. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 27 July 2011 02:01:19 UTC