- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 02 May 2003 05:37:51 +0200
- To: Martin Duerst <duerst@w3.org>
- Cc: public-iri@w3.org
* Martin Duerst wrote: >>IMO, the IRI draft should say, that if %-escaping is used in an IRI, the >>escape sequence must be generated from UTF-8 octets and %-escapes must >>be interpreted as octets in an UTF-8 sequence. > >why should it say so? In that case, you should not really use >%-escaping in an IRI, you should use real characters. What if it is impossible to use "real" characters due limitations of the transport media, the transport encoding, if I need to escape a reserved character to avoid it's special meaning, if the character is disallowed or if I want to encode binary data that does not represent any character? What if my IRI-aware application receives an IRI containing %-escape sequences but needs characters in order to work, like some kind of server for file transfer expecting a file name or a database frontend expecting a search string? Let's say there is an 'uri' URI scheme and an 'iri' IRI scheme (the + in the query part has no special meaning and may thus stay unescaped): uri://www.example.org/search?Bj+APY-rn iri://www.example.org/search?Bj+APY-rn Decoding the query part of the URI I would get the octets <42><6A><2B><41><50><59><2D><72><6E> The database frontend would then search for "Björn", since it decodes the octets represented by characters in the URL as UTF-7 octets. What about the IRI? Is the frontend supposed to search for "Bj+APY-rn" or for "Björn"? Is a data character in an IRI a character or is it a representation of an octet or even something else? If an IRI data character is a "real" character, refer %-escape sequence also to real characters? Are these IRIs equivalent: iri://www.example.org/search?Bj%F6rn iri://www.example.org/search?Björn just like these URIs are: uri://www.example.org/search?a uri://www.example.org/search?%61 Are these equivalent: iri://www.example.org/search?Bj%C3%B6rn iri://www.example.org/search?Björn and are these IRIs: iri://www.example.org/search?a iri://www.example.org/search?%61 equivalent? If the latter two IRIs are equivalent, how would one then encode binary data in an IRI? What octets are represented in the query part of e.g. iri://www.example.org/search?<U+20AC> iri://www.example.org/search?<U+1D7F6> Consider I want to send an IRI in a text/plain e-mail using us-ascii, but the IRI has non-ASCII characters, like iri://www.example.org/björn can I use %-escaping to encode the "ö" and if yes, how would the IRI then look like? Would it be iri://www.example.org/bj%F6rn iri://www.example.org/bj%ECrn iri://www.example.org/bj%C3%B6rn iri://www.example.org/bj%00%F6rn ... Currently neither RFC 2396 nor the IRI draft give an advise here. Is this a scenario not supported by IRIs? If yes, why do you think it is not necessary or not possible to support it, and why does the IRI draft not mention that %-escaping cannot be used for non-ASCII characters, but rather says it SHOULD NOT be used? If it is possible to use %-escaping for non-ASCII characters, the IRI draft must say how the non-ASCII character have to be encoded (actually, how any character is to be encoded) and should say, how one gets the characters back. regards.
Received on Thursday, 1 May 2003 23:38:13 UTC