W3C home > Mailing lists > Public > uri@w3.org > May 2004

Re: draft-duerst-iri-07.txt: 2 week mailing list last call

From: Martin Duerst <duerst@w3.org>
Date: Wed, 12 May 2004 17:59:50 +0900
Message-Id: <4.2.0.58.J.20040512172133.05a8c738@localhost>
To: Graham Klyne <GK@ninebynine.org>, public-iri@w3.org
Cc: uri@w3.org

Hello Graham,

I have labeled this issue as convertASCII-30.


At 12:02 04/05/10 +0100, Graham Klyne wrote:

>Section 3.2:
>
>Is this really true (about always mapping back to the same URI)?:
>[[
>3.2  Converting URIs to IRIs
>
>    In some situations, it may be desirable to try to convert a URI into
>    an equivalent IRI. This section gives a procedure to do such a
>    conversion. The conversion described in this section will always
>    result in an IRI which maps back to the URI that was used as an input
>    for the conversion (except for potential case differences in
>    percent-encoding). However, the IRI resulting from this conversion
>    may not be exactly the same as the original IRI (if there ever was
>    one).
>]]
>
>In light of:
>[[
>    2) Convert all percent-encodings (% followed by two hexadecimal
>       digits) except those corresponding to '%', characters in
>       'reserved', and characters in US-ASCII not allowed in URIs, to the
>       corresponding octets.
>]]
>
>It seems to me that removing percent encodings for non-reserved and other 
>characters is a non-reversible transformation.  I think that mapping back 
>to the original URI is only true under escape normalization, per rfc2396bis.

Yes, good catch. I looked at the actual text that needs to be fixed.
I can either add non-reserved ASCII characters to the 'except'
clause in parentheses in the original text, or can change the
procedure. Overall, in terms of edits, both need about the same
work. Which one would you prefer?

It is clear that with or without removing percent-encodings for
non-reserved ASCII characters, this can be done, and different
usages may choose different variants, according to their needs.


>Also, not knowing anything about bidi encodings, it's difficult for me to 
>tell if there's any possible interaction between this and the section 4 
>material on bidi sequences.

There is some interaction as some characters and character
combinations are excluded by the bidi section. I think the
various cross-references within the text take care of this.
There is also some interaction that with the conversion
from URI to IRI, the display sequence of the components
may change. But this will just happen automatically, this
is not something the algorithm has to worry about.


Regards,    Martin.
Received on Wednesday, 12 May 2004 05:27:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:13:51 UTC