W3C home > Mailing lists > Public > uri@w3.org > September 2003

Escaping issues

From: David Hopwood <david.hopwood@zetnet.co.uk>
Date: Fri, 19 Sep 2003 13:52:35 +0000
Message-ID: <3F6B0A23.FA07CB4A@zetnet.co.uk>
To: web-calculus@waterken.com, uri@w3.org


[Context for the URI WG list: we are talking about whether two URIs
such as the following are equivalent, in the sense that an URI processing
application is permitted to convert the former to the latter:


while claiming not to have changed which resource the URI points to.

Note that '+' is in the <reserved> production, but is not specifically
reserved in the <authority> or <userinfo> component.

If this were possible, then there would be a security problem in a
proposed application, so we need to distinguish between the answers
"definitely no" vs. "yes or maybe". It would also be useful to know
whether the answer is different for RFC 2396bis as compared to 2396.]

Tyler Close wrote:
> On Friday 19 September 2003 00:23, David Hopwood wrote:
> > "+" is not reserved in the <authority> component:
> Yes, and section 2.2 of RFC 2396bis says:
> "Allowed reserved characters that are not assigned a sub-component
> delimiter role by this specification should be considered reserved
> for special use by whatever software generates the URI (i.e., they
> may be used to delimit or indicate information that is significant
> to interpretation of the identifier, but that significance is
> outside the scope of this specification)."

RFC 2396 did not include this paragraph, and 2396 is what existing proxies,
firewalls, etc. may be implementing.

> That's exactly what we want to do, so we should be using one of
> the reserverd characters that is not already assigned a meaning
> within the <authority> component. The '+' character fits the bill.
> > Also not the issue. There's no way to guarantee that it isn't escaped
> > by any URI processing applications (including proxies, firewalls, etc.)
> RFC 2396bis specifically forbids the software from escaping a
> reserved character.

I interpret that as meaning a character that is reserved within each
particular field:

2.4.2 When to Escape and Unescape

   Under normal circumstances, the only time that characters within a
   URI string are escaped is during the process of generating the URI
   from its component parts.  Each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that component can determine whether or
   not escaping a character will change its semantics.  The exception is
   when a URI is being used within a context where the unreserved "mark"
   characters might need to be escaped, such as when used for a
   command-line argument or within a single-quoted attribute.

   Once generated, a URI is always in an escaped form.  When a URI is
   resolved, the components significant to that scheme-specific
   resolution process (if any) must be parsed and separated before the
   escaped characters within those components can be safely unescaped.

   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  A URI
   normalizer may unescape escaped octets that are represented by
   characters in the unreserved set.  For example, "%7E" is sometimes
   used instead of tilde ("~") in an "http" URI path and can be
   converted to "~" without changing the interpretation of the URI.

My reading of both 2396 and 2396bis is that an URI processor is allowed
to parse an URI and then reconstruct it, and that in the process of
reconstruction it may escape any character that is not reserved within
each field.

Note that we're not talking about what an URI processor should do; only
what it could possibly do without being nonconformant.

> Do you know of any important software that violates all of the URL
> specifications on this topic?

I don't think this behaviour violates the spec (unfortunately). And for
a security issue, all software is important.

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

Version: 2.6.3i
Charset: noconv

Received on Sunday, 21 September 2003 15:48:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:06 UTC