Re: draft-duerst-iri-07.txt: 2 week mailing list last call from Graham Klyne on 2004-05-12 (uri@w3.org from May 2004)

From: Graham Klyne <gk@ninebynine.org>
Date: Wed, 12 May 2004 14:06:21 +0100
To: Martin Duerst <duerst@w3.org>, public-iri@w3.org
Cc: uri@w3.org
Message-Id: <5.1.0.14.2.20040512140428.02c15b78@127.0.0.1>

At 17:59 12/05/04 +0900, Martin Duerst wrote:
>Hello Graham,
>
>I have labeled this issue as convertASCII-30.
>
>
>At 12:02 04/05/10 +0100, Graham Klyne wrote:
>
>>Section 3.2:
>>
>>Is this really true (about always mapping back to the same URI)?:
>>[[
>>3.2  Converting URIs to IRIs
>>
>>    In some situations, it may be desirable to try to convert a URI into
>>    an equivalent IRI. This section gives a procedure to do such a
>>    conversion. The conversion described in this section will always
>>    result in an IRI which maps back to the URI that was used as an input
>>    for the conversion (except for potential case differences in
>>    percent-encoding). However, the IRI resulting from this conversion
>>    may not be exactly the same as the original IRI (if there ever was
>>    one).
>>]]
>>
>>In light of:
>>[[
>>    2) Convert all percent-encodings (% followed by two hexadecimal
>>       digits) except those corresponding to '%', characters in
>>       'reserved', and characters in US-ASCII not allowed in URIs, to the
>>       corresponding octets.
>>]]
>>
>>It seems to me that removing percent encodings for non-reserved and other 
>>characters is a non-reversible transformation.  I think that mapping back 
>>to the original URI is only true under escape normalization, per rfc2396bis.
>
>Yes, good catch. I looked at the actual text that needs to be fixed.
>I can either add non-reserved ASCII characters to the 'except'
>clause in parentheses in the original text, or can change the
>procedure. Overall, in terms of edits, both need about the same
>work. Which one would you prefer?

I'm not sure.  I think it's most important to remove the inconsistency.  I 
think that, in practice, this is an area which developers and users would 
be well-advised to avoid.

#g
--

>It is clear that with or without removing percent-encodings for
>non-reserved ASCII characters, this can be done, and different
>usages may choose different variants, according to their needs.
>
>
>>Also, not knowing anything about bidi encodings, it's difficult for me to 
>>tell if there's any possible interaction between this and the section 4 
>>material on bidi sequences.
>
>There is some interaction as some characters and character
>combinations are excluded by the bidi section. I think the
>various cross-references within the text take care of this.
>There is also some interaction that with the conversion
>from URI to IRI, the display sequence of the components
>may change. But this will just happen automatically, this
>is not something the algorithm has to worry about.
>
>
>Regards,    Martin.
>
>

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Received on Wednesday, 12 May 2004 09:25:31 UTC