W3C home > Mailing lists > Public > public-iri@w3.org > May 2003

Re: Interpretation if %-escapes in IRIs [escapeInterpret-14]

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 02 May 2003 05:37:51 +0200
To: Martin Duerst <duerst@w3.org>
Cc: public-iri@w3.org
Message-ID: <3f2ce7ed.656144636@smtp.bjoern.hoehrmann.de>

* Martin Duerst wrote:
>>IMO, the IRI draft should say, that if %-escaping is used in an IRI, the
>>escape sequence must be generated from UTF-8 octets and %-escapes must
>>be interpreted as octets in an UTF-8 sequence.
>why should it say so? In that case, you should not really use
>%-escaping in an IRI, you should use real characters.

What if it is impossible to use "real" characters due limitations of the
transport media, the transport encoding, if I need to escape a reserved
character to avoid it's special meaning, if the character is disallowed
or if I want to encode binary data that does not represent any

What if my IRI-aware application receives an IRI containing %-escape
sequences but needs characters in order to work, like some kind of
server for file transfer expecting a file name or a database frontend
expecting a search string?

Let's say there is an 'uri' URI scheme and an 'iri' IRI scheme (the + in
the query part has no special meaning and may thus stay unescaped):


Decoding the query part of the URI I would get the octets


The database frontend would then search for "Björn", since it decodes
the octets represented by characters in the URL as UTF-7 octets. What
about the IRI? Is the frontend supposed to search for "Bj+APY-rn" or
for "Björn"? Is a data character in an IRI a character or is it a
representation of an octet or even something else?

If an IRI data character is a "real" character, refer %-escape sequence
also to real characters? Are these IRIs equivalent:


just like these URIs are:


Are these equivalent:


and are these IRIs:


equivalent? If the latter two IRIs are equivalent, how would one then
encode binary data in an IRI? What octets are represented in the query
part of e.g.


Consider I want to send an IRI in a text/plain e-mail using us-ascii,
but the IRI has non-ASCII characters, like


can I use %-escaping to encode the "ö" and if yes, how would the IRI
then look like? Would it be


Currently neither RFC 2396 nor the IRI draft give an advise here. Is
this a scenario not supported by IRIs? If yes, why do you think it is
not necessary or not possible to support it, and why does the IRI draft
not mention that %-escaping cannot be used for non-ASCII characters, but
rather says it SHOULD NOT be used? If it is possible to use %-escaping
for non-ASCII characters, the IRI draft must say how the non-ASCII
character have to be encoded (actually, how any character is to be
encoded) and should say, how one gets the characters back.

Received on Thursday, 1 May 2003 23:38:13 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:30 UTC