W3C home > Mailing lists > Public > public-qt-comments@w3.org > March 2003

Re: why the special case for % in fn:escape-uri?

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 14 Mar 2003 18:02:40 +0000
Message-Id: <5.1.0.14.2.20030314175041.0267a448@127.0.0.1>
To: Dan Connolly <connolly@w3.org>, public-qt-comments@w3.org
Cc: uri@w3.org

At 11:22 14/03/2003 -0600, Dan Connolly wrote:
>I see:
>
>6.4.19.1 Examples
>       * fn:escape-uri
> 
>("gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles#ocean", 
>true()) returns 
>"gopher%3A%2F%2Fspinaltap.micro.umn.edu%2F00%2FWeather%2FCalifornia%2FLos%20Angeles%23ocean"
>
>
>http://www.w3.org/TR/xquery-operators/#func-escape-uri
>
>but the % after Los needs to be escaped, no?
>
>i.e. the result should be ala...
>
> >>> from urllib import quote
> >>> s =
>"gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles#ocean"
> >>> quote(s, safe='')
>'gopher%3A%2F%2Fspinaltap.micro.umn.edu%2F00%2FWeather%2FCalifornia%2FLos%2520Angeles%23ocean'
>
>Note the ...Los%2520...
>
>
>Hmm... the spec seems to special-case this:
>
>   The "%" character itself is escaped only if it is not followed
>   by two hexadecimal digits (that is, 0-9, a-f, and A-F)
>
>I don't understand why.
>
>Also... what does 'when escaping an entire URI or URI reference' refer
>to?

I assume it means escaping a URI, e.g. to be embedded inside another, or 
something like that.


>Hmm... Graham, have you thought about a suite of tests for
>URI escaping issues?

Not specifically, though I did include some in my suite; e.g. below.  I've 
not at all considered escaping entire URIs, which seems to be the thrust of 
your comments.

With the respect to the special case you mention, my reading of the spec 
was that '%' not followed by two hex digits is not valid in a URI.  Hence 
my invalid tests.

My full test suite is in file "URITest.hs" at:
   http://www.ninebynine.org/Software/HaskellURI.zip
Of course, many of the tests were lifted from yours.

#g
--

Excerpts from my tests...

[[
-- Invalid
testURIRef083 = testURIRef InvRf "%"
testURIRef084 = testURIRef InvRf "A%Z"
testURIRef085 = testURIRef InvRf "%ZZ"
testURIRef086 = testURIRef InvRf "%AZ"

   :

testRelative40 = testRelative "testRelative40"
                     "http://example/x/y%2Fz" "http://example/x/abc" "abc"
testRelative41 = testRelative "testRelative41"
                     "http://example/a/x/y/z" "http://example/a/x%2Fabc" 
"../../x%2Fabc"
testRelative42 = testRelative "testRelative42"
                     "http://example/a/x/y%2Fz" "http://example/a/x%2Fabc" 
"../x%2Fabc"
testRelative43 = testRelative "testRelative43"
                     "http://example/x%2Fy/z" "http://example/x%2Fy/abc" "abc"
testRelative44 = testRelative "testRelative44"
                     "http://ex/x/y" "http://ex/x/q%3Ar" "q%3Ar"
testRelative45 = testRelative "testRelative45"
                     "http://example/x/y%2Fz" "http://example/x%2Fabc" 
"/x%2Fabc"
-- Apparently, TimBL prefers the following way to 41, 42 above
-- cf. http://lists.w3.org/Archives/Public/uri/2003Feb/0028.html
-- He also notes that there may be different relative fuctions
-- that satisfy the basic equivalence axiom:
-- cf. http://lists.w3.org/Archives/Public/uri/2003Jan/0008.html
testRelative46 = testRelative "testRelative46"
                     "http://example/x/y/z" "http://example/x%2Fabc" "/x%2Fabc"
testRelative47 = testRelative "testRelative47"
                     "http://example/x/y%2Fz" "http://example/x%2Fabc" 
"/x%2Fabc"
]]




-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Friday, 14 March 2003 13:11:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:14:24 GMT