RE: Percent encoding normalization v. mapping URIs to IRIs from Larry Masinter on 2009-12-02 (public-iri@w3.org from December 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 2 Dec 2009 03:08:48 -0800
To: Geoffrey Sneddon <gsneddon@opera.com>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118DC9EC99F@nambx04.corp.adobe.com>

# Given something that is both a URI and an IRI that contains a
# pct-encoded unreserved character, such as http://example.com/%41, you
# may apply percent-encoding normalization to end up with
# http://example.com/A, however, this MUST only be used for local
# comparison (currently section 5.3.2.3) and not passed along anywhere
# else. However, if you follow the steps for converting the URI to an IRI
# you will likewise end up with http://example.com/A, but with no such
# restriction on use of the converted string.

Yes, I think this is an error in "convert a URI to an IRI". The
goal of the conversion is 

   The conversion described in this section, if given a valid URI, will
   result in an IRI that maps back to the URI used as an input for the
   conversion (except for potential case differences in percent-encoding
   and for potential percent-encoded unreserved characters).


I think the exception for "potential percent-encoded unreserved characters" 
is inappropriate, and that the hex for percent-encoded but otherwise
allowed characters should not be decoded; in this case, converting an
IRI with "%41" in it should *not* translate to an IRI with an "A".

This error has been in the IRI draft all along -- it's not a new
error.   There may be some kind of other conversions which also
normalize, but I think the goal should be

URI -> IRI will produce an IRI which will map back (exactly) to the
given URI.

Larry
--
http://larry.masinter.net

Received on Wednesday, 2 December 2009 11:09:35 UTC