- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 19 May 2004 16:52:34 +0900
- To: Graham Klyne <gk@ninebynine.org>, public-iri@w3.org
- Cc: uri@w3.org
Hello Graham,
At 14:06 04/05/12 +0100, Graham Klyne wrote:
>At 17:59 12/05/04 +0900, Martin Duerst wrote:
>>Hello Graham,
>>
>>I have labeled this issue as convertASCII-30.
>>
>>
>>At 12:02 04/05/10 +0100, Graham Klyne wrote:
>>
>>>Section 3.2:
>>>
>>>Is this really true (about always mapping back to the same URI)?:
>>>[[
>>>3.2 Converting URIs to IRIs
>>>
>>> In some situations, it may be desirable to try to convert a URI into
>>> an equivalent IRI. This section gives a procedure to do such a
>>> conversion. The conversion described in this section will always
>>> result in an IRI which maps back to the URI that was used as an input
>>> for the conversion (except for potential case differences in
>>> percent-encoding). However, the IRI resulting from this conversion
>>> may not be exactly the same as the original IRI (if there ever was
>>> one).
>>>]]
>>>
>>>In light of:
>>>[[
>>> 2) Convert all percent-encodings (% followed by two hexadecimal
>>> digits) except those corresponding to '%', characters in
>>> 'reserved', and characters in US-ASCII not allowed in URIs, to the
>>> corresponding octets.
>>>]]
>>>
>>>It seems to me that removing percent encodings for non-reserved and
>>>other characters is a non-reversible transformation. I think that
>>>mapping back to the original URI is only true under escape
>>>normalization, per rfc2396bis.
>>
>>Yes, good catch. I looked at the actual text that needs to be fixed.
>>I can either add non-reserved ASCII characters to the 'except'
>>clause in parentheses in the original text, or can change the
>>procedure. Overall, in terms of edits, both need about the same
>>work. Which one would you prefer?
>
>I'm not sure. I think it's most important to remove the inconsistency.
I have decided that it is better to also remove spurious percent-encodings
of non-reserved US-ASCII characters, because probably the main use of
the conversion from URIs to IRIs is for presentation purposes.
I have changed
(except for potential case differences in percent-encoding)
to
(except for potential case differences in percent-encoding
and for potential percent-encoded unreserved characters)
I have also changed
This procedure will convert as many percent-encoded non-ASCII
characters as possible to characters in an IRI.
to
This procedure will convert as many percent-encoded characters
as possible to characters in an IRI.
I hope this addresses your concern.
Regards, Martin.
>I think that, in practice, this is an area which developers and users
>would be well-advised to avoid.
>
>#g
>--
>
>>It is clear that with or without removing percent-encodings for
>>non-reserved ASCII characters, this can be done, and different
>>usages may choose different variants, according to their needs.
>>
>>
>>>Also, not knowing anything about bidi encodings, it's difficult for me
>>>to tell if there's any possible interaction between this and the section
>>>4 material on bidi sequences.
>>
>>There is some interaction as some characters and character
>>combinations are excluded by the bidi section. I think the
>>various cross-references within the text take care of this.
>>There is also some interaction that with the conversion
>>from URI to IRI, the display sequence of the components
>>may change. But this will just happen automatically, this
>>is not something the algorithm has to worry about.
>>
>>
>>Regards, Martin.
>>
>
>------------
>Graham Klyne
>For email:
>http://www.ninebynine.org/#Contact
Received on Wednesday, 19 May 2004 03:56:44 UTC