- From: Ashok Malhotra <ashokma@microsoft.com>
- Date: Wed, 10 Dec 2003 08:04:57 -0800
- To: "Oliver Becker" <obecker@informatik.hu-berlin.de>, <public-qt-comments@w3.org>
- Message-ID: <EDB607C8AC991F40BE646533A1A673E8CE7ECF@RED-MSG-42.redmond.corp.microsoft.com>
Oliver: This is in reply to your mail below and the accompanying 3 notes on the same subject. The F&O taskforce discussed your editorial suggestions on the Dec 9 telcon and agreed to clarify the wording and add an example. Here is suggested wording for this function. Please take a look. 7.4.10 fn:escape-uri fn:escape-uri( $uri-part as xs:string?, $escape-reserved as xs:boolean) as xs:string Summary: This function applies the URI escaping rules defined in section 2 of [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> as amended by [RFC 2732] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> , with one exception, to the string supplied as $uri-part, which typically represents all or part of a URI. The effect of the function is to escape a set of identified characters in the string. Each such character is replaced in the string by an escape sequence of the form %HH, where HH is the hexadecimal representation of the octets used to represent the character in UTF-8. The set of characters that are escaped depends on the setting of the boolean argument $escape-reserved. If $uri-part is the empty sequence, returns the zero-length string. If $escape-reserved is true, all characters are escaped other than the lower case letters a-z, the upper case letters A-Z, the digits 0-9, the PERCENT SIGN "%" and the NUMBER SIGN "#" characters and the characters referred to in [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> as "marks": specifically, HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", EXCLAMATION MARK "!", TILDE "~", ASTERISK "*", APOSTROPHE "'", LEFT PARENTHESIS "(", and RIGHT PARENTHESIS ")". If $escape-reserved is false, all characters are escaped other than the lower case letters a-z, the upper case letters A-Z, the digits 0-9, the PERCENT SIGN "%" and the NUMBER SIGN "#" characters and the characters referred to in [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> as "marks": HYPHEN-MINUS ("-"), LOW LINE ("_"), FULL STOP ".", EXCLAMATION MARK "!", TILDE "~", ASTERISK "*", APOSTROPHE "'", LEFT PARENTHESIS "(", and RIGHT PARENTHESIS ")". In addition, the characters referred to in [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> and [RFC 2732] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> as reserved characters, (See [Uniform Resource Identifiers (URI): Generic Syntax] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> ) are not escaped. These characters are: SEMICOLON ";", SOLIDUS "/", QUESTION MARK "?", COLON ":", COMMERCIAL AT "@", AMPERSAND "&", EQUALS SIGN "=", PLUS SIGN "+", DOLLAR SIGN "$", COMMA "," NUMBER SIGN "#", LEFT SQUARE BRACKET "[" and RIGHT SQUARE BRACKET "]". [RFC 2396] <file:///C:\XMLSpecs\Query\FandO\Work\xpath-functions.html> does not define whether escaped URIs should use lower case or upper case for hexadecimal digits. To ensure that escaped URIs can be compared using string comparison functions, this function must always generate hexadecimal values using the upper-case letters A-F. Generally, $escape-reserved should be set to true when escaping a string that is to form a single part of a URI, and to false when escaping an entire URI or URI reference. 7.4.10.1 Examples * fn:escape-uri ("http://www.example.com/Weather/CA/Los%20Angeles#ocean", true()) returns "http%3A%2F%2Fwww.example.com%2F00%2FWeather%2FCA%2FLos%20Angeles#ocean" * fn:escape-uri ("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean", false()) returns "http://www.example.com/00/Weather/CA/Los%20Angeles#ocean" * fn:escape-uri ("http://www.example.com/~bébé", true()) returns "http://www.example.com/%7Eb%E9b%E9" All the best, Ashok -----Original Message----- From: public-qt-comments-request@w3.org [mailto:public-qt-comments-request@w3.org] On Behalf Of Oliver Becker Sent: Monday, December 01, 2003 8:05 AM To: public-qt-comments@w3.org Subject: [FO]: OB01 escape-uri The wording in 7.4.10 fn:escape-uri of the Functions and Operators WD seems to suggest that the percent sign '%' is not escaped if $escape-reserved is false. (Because '%' is neither a reserved character in RFC2396 nor in RFC2732.) If I'd be picky then the wording "If $escape-reserved is false, the behavior differs in that characters referred to in [RFC 2396] and [RFC 2732] as reserved characters, together with the NUMBER SIGN '#' character, (See [Uniform Resource Identifiers (URI): Generic Syntax]) are not escaped. These characters are ..." tells me that *only* these reserved characters will be not escaped. The rules would be easier to understand if it could be rephrased along the lines: - Regardless of $escape-reserved the following characters will be escaped: .. - If and only if $escape-reserved is false then additionally the following characters will be escaped: ... Last comment: the sentence (at the beginning of this section) "The effect of the function is to replace any special character in the string by an escape sequence of the form %HH, where HH... is the hexadecimal representation of the octets used to represent the character in UTF-8." could be interpreted as to generate one % and then the hex representations of all UTF-8 octets. Especially the "HH..." seems to suggest this interpretation. Perhaps the phrase "an escape sequence of %HH parts" or something like that makes the intention clearer and unambiguous. Regards, Oliver Becker /-------------------------------------------------------------------\ | ob|do Dipl.Inf. Oliver Becker | | --+-- E-Mail: obecker@informatik.hu-berlin.de | | op|qo WWW: http://www.informatik.hu-berlin.de/~obecker | \-------------------------------------------------------------------/
Received on Wednesday, 10 December 2003 11:05:05 UTC