Re: draft-duerst-iri-07.txt: 2 week mailing list last call from Martin Duerst on 2004-05-19 (public-iri@w3.org from May 2004)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 19 May 2004 16:52:34 +0900
To: Graham Klyne <gk@ninebynine.org>, public-iri@w3.org
Cc: uri@w3.org
Message-Id: <4.2.0.58.J.20040519143105.04c0ad50@localhost>
Hello Graham,

At 14:06 04/05/12 +0100, Graham Klyne wrote:

>At 17:59 12/05/04 +0900, Martin Duerst wrote:
>>Hello Graham,
>>
>>I have labeled this issue as convertASCII-30.
>>
>>
>>At 12:02 04/05/10 +0100, Graham Klyne wrote:
>>
>>>Section 3.2:
>>>
>>>Is this really true (about always mapping back to the same URI)?:
>>>[[
>>>3.2  Converting URIs to IRIs
>>>
>>>    In some situations, it may be desirable to try to convert a URI into
>>>    an equivalent IRI. This section gives a procedure to do such a
>>>    conversion. The conversion described in this section will always
>>>    result in an IRI which maps back to the URI that was used as an input
>>>    for the conversion (except for potential case differences in
>>>    percent-encoding). However, the IRI resulting from this conversion
>>>    may not be exactly the same as the original IRI (if there ever was
>>>    one).
>>>]]
>>>
>>>In light of:
>>>[[
>>>    2) Convert all percent-encodings (% followed by two hexadecimal
>>>       digits) except those corresponding to '%', characters in
>>>       'reserved', and characters in US-ASCII not allowed in URIs, to the
>>>       corresponding octets.
>>>]]
>>>
>>>It seems to me that removing percent encodings for non-reserved and 
>>>other characters is a non-reversible transformation.  I think that 
>>>mapping back to the original URI is only true under escape 
>>>normalization, per rfc2396bis.
>>
>>Yes, good catch. I looked at the actual text that needs to be fixed.
>>I can either add non-reserved ASCII characters to the 'except'
>>clause in parentheses in the original text, or can change the
>>procedure. Overall, in terms of edits, both need about the same
>>work. Which one would you prefer?
>
>I'm not sure.  I think it's most important to remove the inconsistency.

I have decided that it is better to also remove spurious percent-encodings
of non-reserved US-ASCII characters, because probably the main use of
the conversion from URIs to IRIs is for presentation purposes.

I have changed

    (except for potential case differences in percent-encoding)

to

    (except for potential case differences in percent-encoding
     and for potential percent-encoded unreserved characters)

I have also changed

    This procedure will convert as many percent-encoded non-ASCII
    characters as possible to characters in an IRI.
to
    This procedure will convert as many percent-encoded characters
    as possible to characters in an IRI.

I hope this addresses your concern.


Regards,   Martin.

>I think that, in practice, this is an area which developers and users 
>would be well-advised to avoid.
>
>#g
>--
>
>>It is clear that with or without removing percent-encodings for
>>non-reserved ASCII characters, this can be done, and different
>>usages may choose different variants, according to their needs.
>>
>>
>>>Also, not knowing anything about bidi encodings, it's difficult for me 
>>>to tell if there's any possible interaction between this and the section 
>>>4 material on bidi sequences.
>>
>>There is some interaction as some characters and character
>>combinations are excluded by the bidi section. I think the
>>various cross-references within the text take care of this.
>>There is also some interaction that with the conversion
>>from URI to IRI, the display sequence of the components
>>may change. But this will just happen automatically, this
>>is not something the algorithm has to worry about.
>>
>>
>>Regards,    Martin.
>>
>
>------------
>Graham Klyne
>For email:
>http://www.ninebynine.org/#Contact
Received on Wednesday, 19 May 2004 03:56:44 UTC