W3C home > Mailing lists > Public > uri@w3.org > June 2014

Re: Percent encoded dots in . and .. path elements

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 27 Jun 2014 09:55:25 -0700
Cc: uri@w3.org
Message-Id: <B64DD001-14DC-42E3-ADC7-FA6EDBFF3732@gbiv.com>
To: ☻Mike Samuel <msamuel@google.com>
On Jun 27, 2014, at 8:51 AM, ☻Mike Samuel wrote:

> Apologies if this is not the right forum for RFC 3986 related questions.
> Dot ('.') is in the unreserved set, and 3986 says
> """
>    URIs that differ in the replacement of an unreserved character with
>    its corresponding percent-encoded US-ASCII octet are equivalent: they
>    identify the same resource.
> """
> which leads me to believe that %2E which encodes dot should be
> normalized before interpreting "." and ".." path elements when doing
> path resolution.

It can be normalized, yes.  It could also be rejected for security reasons,
or simply processed as is if the resource wants to do so.

> If so, then resolving
>  Base URI:  /x/y/z/
> against
>  Relative URI: .%2E
> should yield
>  /x/y/
> and not
>  /x/y/z/.%2E
> The existing libraries that I tested (Java's java.net.URI, Python's
> urlparse.urljoin) yield /x/y/z/.%2E and Java's normalize() method does
> not recognize the last path element as special.
> Browser's seem to differ.  Chrome and Safari seem to normalize ".%2E" early.
> Firefox seems to be leaving it up to the protocol handler.
> "https://www.google.com/webhp/.%2E" beGETs "www.google.com/."
> "file:///Users/msamuel/work/.%2E" fetches the right resource but ".."
> shows up as a path element in the URL bar.
> "http://urlecho.appspot.com/echo/z/.%2E" beGETs "urlecho.appspot.com/echo/z/.."
> Should resolution/normalization treat the path element ".%2E" as special?

This depends on when the %2E is processed.  Usually, references
are normalized after resolution to absolute form because the scheme
impacts normalization.  Since normalization is optional, various implementations
will differ regarding to when it is done (if at all).  Likewise, ".." is
only special during the relative->absolute conversion, so normalizing the
%2E after relative parsing is going to result in a ".." segment.

What the spec says is that ".%2E" and ".." are equivalent, meaning that
a server is likely to decode it to ".." and either reject the request for
security reasons or redirect it to a URI without the corresponding "/parent/..".
A redirect is necessary to avoid security bypass on the server path.

What the same thing means for locally processed file URIs is currently
not standardized due to lack of consensus among user agents, though
I would expect a browser to do the same processing as a server.

Note that you have to be careful in testing to see whether the browser
is normalizing the URI before the request or if it is being normalized
and redirected by the server after the (initial) request.

Received on Friday, 27 June 2014 16:55:48 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:16 UTC