[Bug 2457] Rules for URI encoding don't match RFC 3986/3987

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2457





------- Additional Comments From Norman.Walsh@Sun.COM  2006-01-17 15:57 -------
My proposal per ACTION A-282-01

fn:encode-for-uri

  fn:encode-for-uri($uri-part as xs:string?) as xs:string

Summary: This function encodes reserved characters in an xs:string
that is intended to be used in the path segment of a URI. It is
invertible but not idempotent. This function applies the URI escaping
rules defined in section 2 of [RFC 3986] to the string supplied as
$uri-part. The effect of the function is to escape reserved
characters. Each such character in the string is replaced with its
percent-encoded form as described in [RFC 3986].

If $uri-part is the empty sequence, returns the zero-length string.

All characters are escaped except those identified as "unreserved" by
[RFC 3986], that is the upper- and lower-case letters A-Z, the digits
0-9, HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", and TILDE "~".

Note that this function escapes URI delimiters and therefore cannot be
used indiscriminately to encode "invalid" characters in a path
segment.

Since [RFC 3986] recommends that, for consistency, URI producers and
normalizers should use uppercase hexadecimal digits for all
percent-encodings, this function must always generate hexadecimal
values using the upper-case letters A-F.

Examples

    * fn:encode-for-uri("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean") 
      returns 
"http%3A%2F%2Fwww.example.com%2F00%2FWeather%2FCA%2FLos%2520Angeles#ocean".
      This is probably not what the user intended because all of the delimiters
      have been encoded.

    * concat("http://www.example.com/", encode-for-uri("~bébé"))
      returns "http://www.example.com/~b%C3%A9b%C3%A9".

    * concat("http://www.example.com/", encode-for-uri("100% organic"))
      returns "http://www.example.com/100%25%20organic".

fn:iri-to-uri

  fn:iri-to-uri($uri-part as xs:string?) as xs:string

Summary: This function converts an xs:string containing an IRI into
a URI according to the rules spelled out in Section 3.1 of [RFC 3987].
It is idempotent but not invertible.

If $uri-part is the empty sequence, returns the zero-length string.

Since [RFC 3986] recommends that, for consistency, URI producers and
normalizers should use uppercase hexadecimal digits for all
percent-encodings, this function must always generate hexadecimal
values using the upper-case letters A-F.

Note:

  Since this function does not escape the PERCENT SIGN "%" and this
  character is not allowed in data within a URI, users wishing to
  convert character strings, such as file names, that include "%" to a
  URI should manually escape "%" by replacing it with "%25".

Received on Tuesday, 17 January 2006 15:57:17 UTC