- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 19 May 2004 10:10:29 +0100
- To: Martin Duerst <duerst@w3.org>, public-iri@w3.org
- Cc: uri@w3.org
Martin, For the process, this looks fine to me. #g -- At 16:52 19/05/04 +0900, Martin Duerst wrote: >Hello Graham, > >At 14:06 04/05/12 +0100, Graham Klyne wrote: > >>At 17:59 12/05/04 +0900, Martin Duerst wrote: >>>Hello Graham, >>> >>>I have labeled this issue as convertASCII-30. >>> >>> >>>At 12:02 04/05/10 +0100, Graham Klyne wrote: >>> >>>>Section 3.2: >>>> >>>>Is this really true (about always mapping back to the same URI)?: >>>>[[ >>>>3.2 Converting URIs to IRIs >>>> >>>> In some situations, it may be desirable to try to convert a URI into >>>> an equivalent IRI. This section gives a procedure to do such a >>>> conversion. The conversion described in this section will always >>>> result in an IRI which maps back to the URI that was used as an input >>>> for the conversion (except for potential case differences in >>>> percent-encoding). However, the IRI resulting from this conversion >>>> may not be exactly the same as the original IRI (if there ever was >>>> one). >>>>]] >>>> >>>>In light of: >>>>[[ >>>> 2) Convert all percent-encodings (% followed by two hexadecimal >>>> digits) except those corresponding to '%', characters in >>>> 'reserved', and characters in US-ASCII not allowed in URIs, to the >>>> corresponding octets. >>>>]] >>>> >>>>It seems to me that removing percent encodings for non-reserved and >>>>other characters is a non-reversible transformation. I think that >>>>mapping back to the original URI is only true under escape >>>>normalization, per rfc2396bis. >>> >>>Yes, good catch. I looked at the actual text that needs to be fixed. >>>I can either add non-reserved ASCII characters to the 'except' >>>clause in parentheses in the original text, or can change the >>>procedure. Overall, in terms of edits, both need about the same >>>work. Which one would you prefer? >> >>I'm not sure. I think it's most important to remove the inconsistency. > >I have decided that it is better to also remove spurious percent-encodings >of non-reserved US-ASCII characters, because probably the main use of >the conversion from URIs to IRIs is for presentation purposes. > >I have changed > > (except for potential case differences in percent-encoding) > >to > > (except for potential case differences in percent-encoding > and for potential percent-encoded unreserved characters) > >I have also changed > > This procedure will convert as many percent-encoded non-ASCII > characters as possible to characters in an IRI. >to > This procedure will convert as many percent-encoded characters > as possible to characters in an IRI. > >I hope this addresses your concern. > > >Regards, Martin. > >>I think that, in practice, this is an area which developers and users >>would be well-advised to avoid. >> >>#g >>-- >> >>>It is clear that with or without removing percent-encodings for >>>non-reserved ASCII characters, this can be done, and different >>>usages may choose different variants, according to their needs. >>> >>> >>>>Also, not knowing anything about bidi encodings, it's difficult for me >>>>to tell if there's any possible interaction between this and the >>>>section 4 material on bidi sequences. >>> >>>There is some interaction as some characters and character >>>combinations are excluded by the bidi section. I think the >>>various cross-references within the text take care of this. >>>There is also some interaction that with the conversion >>>from URI to IRI, the display sequence of the components >>>may change. But this will just happen automatically, this >>>is not something the algorithm has to worry about. >>> >>> >>>Regards, Martin. >> >>------------ >>Graham Klyne >>For email: >>http://www.ninebynine.org/#Contact > >------------ >Graham Klyne >For email: >http://www.ninebynine.org/#Contact
Received on Wednesday, 19 May 2004 05:52:59 UTC