- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 12 May 2004 17:59:50 +0900
- To: Graham Klyne <GK@ninebynine.org>, public-iri@w3.org
- Cc: uri@w3.org
Hello Graham, I have labeled this issue as convertASCII-30. At 12:02 04/05/10 +0100, Graham Klyne wrote: >Section 3.2: > >Is this really true (about always mapping back to the same URI)?: >[[ >3.2 Converting URIs to IRIs > > In some situations, it may be desirable to try to convert a URI into > an equivalent IRI. This section gives a procedure to do such a > conversion. The conversion described in this section will always > result in an IRI which maps back to the URI that was used as an input > for the conversion (except for potential case differences in > percent-encoding). However, the IRI resulting from this conversion > may not be exactly the same as the original IRI (if there ever was > one). >]] > >In light of: >[[ > 2) Convert all percent-encodings (% followed by two hexadecimal > digits) except those corresponding to '%', characters in > 'reserved', and characters in US-ASCII not allowed in URIs, to the > corresponding octets. >]] > >It seems to me that removing percent encodings for non-reserved and other >characters is a non-reversible transformation. I think that mapping back >to the original URI is only true under escape normalization, per rfc2396bis. Yes, good catch. I looked at the actual text that needs to be fixed. I can either add non-reserved ASCII characters to the 'except' clause in parentheses in the original text, or can change the procedure. Overall, in terms of edits, both need about the same work. Which one would you prefer? It is clear that with or without removing percent-encodings for non-reserved ASCII characters, this can be done, and different usages may choose different variants, according to their needs. >Also, not knowing anything about bidi encodings, it's difficult for me to >tell if there's any possible interaction between this and the section 4 >material on bidi sequences. There is some interaction as some characters and character combinations are excluded by the bidi section. I think the various cross-references within the text take care of this. There is also some interaction that with the conversion from URI to IRI, the display sequence of the components may change. But this will just happen automatically, this is not something the algorithm has to worry about. Regards, Martin.
Received on Wednesday, 12 May 2004 05:27:33 UTC