- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 19 May 2004 16:52:34 +0900
- To: Graham Klyne <gk@ninebynine.org>, public-iri@w3.org
- Cc: uri@w3.org
Hello Graham, At 14:06 04/05/12 +0100, Graham Klyne wrote: >At 17:59 12/05/04 +0900, Martin Duerst wrote: >>Hello Graham, >> >>I have labeled this issue as convertASCII-30. >> >> >>At 12:02 04/05/10 +0100, Graham Klyne wrote: >> >>>Section 3.2: >>> >>>Is this really true (about always mapping back to the same URI)?: >>>[[ >>>3.2 Converting URIs to IRIs >>> >>> In some situations, it may be desirable to try to convert a URI into >>> an equivalent IRI. This section gives a procedure to do such a >>> conversion. The conversion described in this section will always >>> result in an IRI which maps back to the URI that was used as an input >>> for the conversion (except for potential case differences in >>> percent-encoding). However, the IRI resulting from this conversion >>> may not be exactly the same as the original IRI (if there ever was >>> one). >>>]] >>> >>>In light of: >>>[[ >>> 2) Convert all percent-encodings (% followed by two hexadecimal >>> digits) except those corresponding to '%', characters in >>> 'reserved', and characters in US-ASCII not allowed in URIs, to the >>> corresponding octets. >>>]] >>> >>>It seems to me that removing percent encodings for non-reserved and >>>other characters is a non-reversible transformation. I think that >>>mapping back to the original URI is only true under escape >>>normalization, per rfc2396bis. >> >>Yes, good catch. I looked at the actual text that needs to be fixed. >>I can either add non-reserved ASCII characters to the 'except' >>clause in parentheses in the original text, or can change the >>procedure. Overall, in terms of edits, both need about the same >>work. Which one would you prefer? > >I'm not sure. I think it's most important to remove the inconsistency. I have decided that it is better to also remove spurious percent-encodings of non-reserved US-ASCII characters, because probably the main use of the conversion from URIs to IRIs is for presentation purposes. I have changed (except for potential case differences in percent-encoding) to (except for potential case differences in percent-encoding and for potential percent-encoded unreserved characters) I have also changed This procedure will convert as many percent-encoded non-ASCII characters as possible to characters in an IRI. to This procedure will convert as many percent-encoded characters as possible to characters in an IRI. I hope this addresses your concern. Regards, Martin. >I think that, in practice, this is an area which developers and users >would be well-advised to avoid. > >#g >-- > >>It is clear that with or without removing percent-encodings for >>non-reserved ASCII characters, this can be done, and different >>usages may choose different variants, according to their needs. >> >> >>>Also, not knowing anything about bidi encodings, it's difficult for me >>>to tell if there's any possible interaction between this and the section >>>4 material on bidi sequences. >> >>There is some interaction as some characters and character >>combinations are excluded by the bidi section. I think the >>various cross-references within the text take care of this. >>There is also some interaction that with the conversion >>from URI to IRI, the display sequence of the components >>may change. But this will just happen automatically, this >>is not something the algorithm has to worry about. >> >> >>Regards, Martin. >> > >------------ >Graham Klyne >For email: >http://www.ninebynine.org/#Contact
Received on Wednesday, 19 May 2004 03:56:44 UTC