- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 30 Apr 2003 20:31:03 +0200
- To: Martin Duerst <duerst@w3.org>
- Cc: public-iri@w3.org
* Martin Duerst wrote: >> Is there a section in the current IRI draft that specifies how >>%-escapes in IRIs are to be interpreted? >%-escapes in IRIs are handled mostly the same way as in URIs. >There is no special text about this. Do you think there should >be? If yes, where should it go? What should it say? The %-escaping mechanism in RFC 2396 is an irreversible encoding, RFC 2396 says, you can escape "&" as %26 but it does not say, that %26 can be unescaped to "&". RFC 2396 also does not specify how characters outside the US-ASCII range have to be %-escaped, neither does the IRI draft (except when IRIs are converted to URIs). IMO, the IRI draft should say, that if %-escaping is used in an IRI, the escape sequence must be generated from UTF-8 octets and %-escapes must be interpreted as octets in an UTF-8 sequence. This approach would be problematical if the IRI originates from an URI that used %-escapes that could not be interpreted as UTF-8 sequence or if people like to encode abitrary binary data in the IRI. The latter is IMO not a valid use case for IRIs, if a specific scheme wants binary data, it should first convert the bytes to characters (using e.g. Base64) and then apply %-escpaping to these characters if necessary. The former could be resolved by either making such URIs unconvertable or by adding an additional escaping scheme for either non-UTF-8 octets or UTF-8 octets (like http://www.example.org/%U0000F6 for http://www.example.org/ö), I prefer to make them unconvertable. IRIs are a sequence of characters, I think this definition should not change to a sequence of characters, intermixed with abitrary octets after unescaping %-escapes.
Received on Wednesday, 30 April 2003 14:31:17 UTC