W3C home > Mailing lists > Public > public-qt-comments@w3.org > December 2003

RE: [FO]: OB01 escape-uri

From: Ashok Malhotra <ashokma@microsoft.com>
Date: Wed, 10 Dec 2003 08:04:57 -0800
Message-ID: <EDB607C8AC991F40BE646533A1A673E8CE7ECF@RED-MSG-42.redmond.corp.microsoft.com>
To: "Oliver Becker" <obecker@informatik.hu-berlin.de>, <public-qt-comments@w3.org>
Oliver:
This is in reply to your mail below and the accompanying 3 notes on the same subject.

The F&O taskforce discussed your editorial suggestions on the Dec 9 telcon and agreed to clarify the wording and add an example.  Here is suggested wording for this function.  Please take a look.
7.4.10 fn:escape-uri
fn:escape-uri(	$uri-part	 as xs:string?,	
	$escape-reserved	 as xs:boolean) as xs:string	
Summary: This function applies the URI escaping rules defined in section 2 of [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html>  as amended by [RFC 2732] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> , with one exception, to the string supplied as $uri-part, which typically represents all or part of a URI. The effect of the function is to escape a set of identified characters in the string. Each such character is replaced in the string by an escape sequence of the form %HH, where HH is the hexadecimal representation of the octets used to represent the character in UTF-8. 
The set of characters that are escaped depends on the setting of the boolean argument $escape-reserved.
If $uri-part is the empty sequence, returns the zero-length string.
If $escape-reserved is true, all characters are escaped other than the lower case letters a-z, the upper case letters A-Z, the digits 0-9, the PERCENT SIGN "%" and the NUMBER SIGN "#" characters and the characters referred to in [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html>  as "marks": specifically, HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", EXCLAMATION MARK "!", TILDE "~", ASTERISK "*", APOSTROPHE "'", LEFT PARENTHESIS "(", and RIGHT PARENTHESIS ")". 
If $escape-reserved is false, all characters are escaped other than the lower case letters a-z, the upper case letters A-Z, the digits 0-9, the PERCENT SIGN "%" and the NUMBER SIGN "#" characters and the characters referred to in [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html>  as "marks": HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", EXCLAMATION MARK "!", TILDE "~", ASTERISK "*", APOSTROPHE "'", LEFT PARENTHESIS "(", and RIGHT PARENTHESIS ")". 
In addition, the characters referred to in [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html>  and [RFC 2732] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html>  as reserved characters, (See [Uniform Resource Identifiers (URI): Generic Syntax] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> ) are not escaped. These characters are: SEMICOLON ";", SOLIDUS "/", QUESTION MARK "?", COLON ":", COMMERCIAL AT "@", AMPERSAND "&", EQUALS SIGN "=", PLUS SIGN "+", DOLLAR SIGN "$", COMMA "," NUMBER SIGN "#", LEFT SQUARE BRACKET "[" and RIGHT SQUARE BRACKET "]". 
[RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html>  does not define whether escaped URIs should use lower case or upper case for hexadecimal digits. To ensure that escaped URIs can be compared using string comparison functions, this function must always generate hexadecimal values using the upper-case letters A-F.
Generally, $escape-reserved should be set to true when escaping a string that is to form a single part of a URI, and to false when escaping an entire URI or URI reference. 

7.4.10.1 Examples
*	fn:escape-uri ("http://www.example.com/Weather/CA/Los%20Angeles#ocean", true()) returns "http%3A%2F%2Fwww.example.com%2F00%2FWeather%2FCA%2FLos%20Angeles#ocean" 
*	fn:escape-uri ("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean", false()) returns "http://www.example.com/00/Weather/CA/Los%20Angeles#ocean" 
*	fn:escape-uri ("http://www.example.com/~bébé", true()) returns "http://www.example.com/%7Eb%E9b%E9" 


All the best, Ashok

-----Original Message-----
From: public-qt-comments-request@w3.org [mailto:public-qt-comments-request@w3.org] On Behalf Of Oliver Becker
Sent: Monday, December 01, 2003 8:05 AM
To: public-qt-comments@w3.org
Subject: [FO]: OB01 escape-uri


The wording in 7.4.10 fn:escape-uri of the Functions and Operators WD
seems to suggest that the percent sign '%' is not escaped if $escape-reserved
is false. (Because '%' is neither a reserved character in RFC2396 nor in
RFC2732.)

If I'd be picky then the wording

"If $escape-reserved is false, the behavior differs in that characters referred 
to in [RFC 2396] and [RFC 2732] as reserved characters, together with the NUMBER 
SIGN '#' character, (See [Uniform Resource Identifiers (URI): Generic Syntax]) 
are not escaped. These characters are ..."

tells me that *only* these reserved characters will be not escaped.

The rules would be easier to understand if it could be rephrased along
the lines:
- Regardless of $escape-reserved the following characters will be escaped: ..
- If and only if $escape-reserved is false then additionally the following 
  characters will be escaped: ...
  
Last comment: the sentence (at the beginning of this section)

"The effect of the function is to replace any special character in the string by 
an escape sequence of the form %HH, where HH... is the hexadecimal 
representation of the octets used to represent the character in UTF-8."

could be interpreted as to generate one % and then the hex representations
of all UTF-8 octets. Especially the "HH..." seems to suggest this 
interpretation. Perhaps the phrase "an escape sequence of %HH parts" or
something like that makes the intention clearer and unambiguous.

Regards,
Oliver Becker


/-------------------------------------------------------------------\
|  ob|do        Dipl.Inf. Oliver Becker                             |
|  --+--        E-Mail: obecker@informatik.hu-berlin.de             |
|  op|qo        WWW:    http://www.informatik.hu-berlin.de/~obecker |
\-------------------------------------------------------------------/
Received on Wednesday, 10 December 2003 11:05:05 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:15 UTC