- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Fri, 14 Mar 2003 22:51:19 +0100
- To: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>, public-qt-comments@w3.org
- Cc: uri@w3.org
> At 11:22 14/03/2003 -0600, Dan Connolly wrote: > >I see: > > > >6.4.19.1 Examples > > * fn:escape-uri > > > >("gopher://spinaltap.micro.umn.edu/00/Weather/California/Los% > 20Angeles# > >ocean", > >true()) returns > >"gopher%3A%2F%2Fspinaltap.micro.umn.edu%2F00%2FWeather%2FCali > fornia%2FLos%20Angeles%23ocean" > > > > > >http://www.w3.org/TR/xquery-operators/#func-escape-uri > > > >but the % after Los needs to be escaped, no? > > > >Hmm... the spec seems to special-case this: > > > > The "%" character itself is escaped only if it is not followed > > by two hexadecimal digits (that is, 0-9, a-f, and A-F) > > > >I don't understand why. RFC 2396 states (in section 2.4.2) "Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string." The reason we have specified escape-uri() as we have is that if the input string contains a "%" sign followed by two hex digits, this probably means that escaping has already been carried out. We can't be sure, of course, but it's a weakness of the escaping scheme that we have no way of telling. We are following the advice "Implementers should be careful not to escape or unescape the same string more than once". We followed precedent here from some other spec, but I forget which it was. > > > >Also... what does 'when escaping an entire URI or URI > reference' refer > >to? > > I assume it means escaping a URI, e.g. to be embedded inside > another, or > something like that. The escape-uri() function has two modes, controlled by a parameter. In one mode characters such as "/" and "?" are escaped, in the other mode they are not. The first mode is suitable for escaping parts of a URI, for example an individual parameter in the query string. The second mode is suitable when a string representing an entire URI is to be escaped in a single operation. This isn't recommended practice but is sometimes unavoidable. I would like to make this sentence clearer if we can but I don't understand why you had difficulty understanding it! I have to say that I find the various RFCs on URI syntax incredibly difficult to follow, and in many places ambiguous or contradictory. Since there seems to be a belief that URIs are the foundation on which the web is built, I would be much more comfortable if the specs were rock-solid rather than shifting sand. With the escape-uri() function (and the rules for URI escaping in XSLT serialization) we've done the best we can, but it's pretty flakey stuff. Michael Kay
Received on Friday, 14 March 2003 16:51:44 UTC