- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 27 Apr 2004 17:45:40 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-iri@w3.org
Hello Bjoern, I haven't heard anything on this, and I'm therefore closing this issue. Regards, Martin. At 06:29 03/06/27 -0400, Martin Duerst wrote: >Hello Bjoern, > >Many thanks for all your questions. > >Most of these questions, if not all of them, are answered >in the actual draft. Please check it and tell me where you >think something is missing or not clear enough. > >At 05:37 03/05/02 +0200, Bjoern Hoehrmann wrote: >>* Martin Duerst wrote: >> >>IMO, the IRI draft should say, that if %-escaping is used in an IRI, the >> >>escape sequence must be generated from UTF-8 octets and %-escapes must >> >>be interpreted as octets in an UTF-8 sequence. >> > >> >why should it say so? In that case, you should not really use >> >%-escaping in an IRI, you should use real characters. >> >>What if it is impossible to use "real" characters due limitations of the >>transport media, the transport encoding, > >Then preferably use a transport-specific escaping or encoding >(e.g. the various MIME mechanisms for email, numeric character >references for HTML and XML,...). > > >>if I need to escape a reserved character to avoid it's special meaning, > >Then use escaping. That's very clear in the draft. > > >>if the character is disallowed > >Then use escaping. Again, the draft says so. > > >>or if I want to encode binary data that does not represent any >>character? > >Then use escaping. Same thing again. > > >>What if my IRI-aware application receives an IRI containing %-escape >>sequences but needs characters in order to work, like some kind of >>server for file transfer expecting a file name or a database frontend >>expecting a search string? > >Then the server will do the conversion from %-escapes to octets >the same way it currently does, and some servers (e.g. Apache and IIS >on WinNT/2000/XP), or server configurations, will convert further, >where possible, to whatever character encoding is used internally >in the server. > > >>Let's say there is an 'uri' URI scheme and an 'iri' IRI scheme > >There is really no such difference. All URI schemes can be used >with IRIs. For some, the benefit of using IRIs is greater than >for others. I think what you wanted to say is that there are >two protocol slots, let's say >iri="http://www.example.org/search?Bj+APY-rn" and >uri="http://www.example.org/search?Bj+APY-rn". I'll assume >this for the following examples, but I'll not change your syntax. > > >>(the + in >>the query part has no special meaning and may thus stay unescaped): >> >> uri://www.example.org/search?Bj+APY-rn >> iri://www.example.org/search?Bj+APY-rn >> >>Decoding the query part of the URI I would get the octets >> >> <42><6A><2B><41><50><59><2D><72><6E> > >Yes. > > >>The database frontend would then search for "Bjo"rn", > >Sorry to have to use "Bjo"rn" for your example due to my >Japanese mailer. > > >>since it decodes >>the octets represented by characters in the URL as UTF-7 octets. > >If the database frontend is programmed that way, then that's correct. > > >>What >>about the IRI? Is the frontend supposed to search for "Bj+APY-rn" or >>for "Bjo"rn"? > >If the same frontend is used, the same thing will happen. >The frontend has no way to distinguish whether it receives an URI >or an IRI. > > >>Is a data character in an IRI a character or is it a >>representation of an octet or even something else? > >It is a character. That does not prohibit that these characters >are (mis)used to represent other characters, as in the case of >UTF-7. > > >>If an IRI data character is a "real" character, refer %-escape sequence >>also to real characters? Are these IRIs equivalent: >> >> iri://www.example.org/search?Bj%F6rn >> iri://www.example.org/search?Bjo"rn > >These are definitely not equivalent, because the %F6 is based >on Latin-1, not UTF-8. > > >>just like these URIs are: >> >> uri://www.example.org/search?a >> uri://www.example.org/search?%61 > >If you read section 6 of >http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-03.txt >carefully, you'll see that these are equivalent >under certain definitions of equivalence, and for >those protocols/applications that use this definition >of equivalence. > > >>Are these equivalent: >> >> iri://www.example.org/search?Bj%C3%B6rn >> iri://www.example.org/search?Bjo"rn > >These are equivalent under certain definitions of equivalence. > > >>and are these IRIs: >> >> iri://www.example.org/search?a >> iri://www.example.org/search?%61 > >They are as equivalent as the same URIs (see above). > > >>equivalent? If the latter two IRIs are equivalent, how would one then >>encode binary data in an IRI? What octets are represented in the query >>part of e.g. >> >> iri://www.example.org/search?<U+20AC> >> iri://www.example.org/search?<U+1D7F6> > >The octets, when octets are needed, are based on UTF-8, i.e. >E2 82 AC in the first case, and F0 9D 9F B6 in the second case. > > >>Consider I want to send an IRI in a text/plain e-mail using us-ascii, >>but the IRI has non-ASCII characters, like >> >> iri://www.example.org/bjo"rn > >In the first place, you should not use us-ascii for sending this IRI. >There are many encodings, starting with iso-8859-1 and utf-8 that >can easily transfer the IRI. > > >>can I use %-escaping to encode the 'o"' and if yes, how would the IRI >>then look like? Would it be >> >> iri://www.example.org/bj%F6rn >> iri://www.example.org/bj%ECrn >> iri://www.example.org/bj%C3%B6rn > >If anything, it would be this one, with "bj%C3%B6rn", using UTF-8. >While this would not work for namespaces (i.e. XML parsers and >XSLT processors would treat the namespaces >iri://www.example.org/bjo"rn and iri://www.example.org/bj%C3%B6rn >differently), it would at least resolve to the same thing, e.g. >over http (exactly the same applies to http://www.example.org/search?a >and http://www.example.org/search?%61). > > >> iri://www.example.org/bj%00%F6rn >> ... >> >>Currently neither RFC 2396 nor the IRI draft give an advise here. Is >>this a scenario not supported by IRIs? > >Which scenario? The scenario of sending IRIs over US-ASCII? >Or another one? > > >>If yes, why do you think it is >>not necessary or not possible to support it, > >If you mean sending IRIs over US-ASCII, then it's not possible in >the same way it's not really possible to send German or Japanese >email over US-ASCII. > > >>and why does the IRI draft >>not mention that %-escaping cannot be used for non-ASCII characters, but >>rather says it SHOULD NOT be used? > >Because it depends on exactly what you are doing. > > >>If it is possible to use %-escaping >>for non-ASCII characters, the IRI draft must say how the non-ASCII >>character have to be encoded (actually, how any character is to be >>encoded) and should say, how one gets the characters back. > >There are two very detailed sections in the draft discussing this. >For escaping, see section 3.1, "Mapping of IRIs to URIs". >For unescaping, see section 3.2, "Converting URIs to IRIs". >If you find anything that is unclear, please tell us, so that I can >fix it. > > >Regards, Martin.
Received on Tuesday, 27 April 2004 04:50:47 UTC