Re: Percent encoded dots in . and .. path elements from Roy T. Fielding on 2014-06-27 (uri@w3.org from June 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 27 Jun 2014 09:55:25 -0700
To: ☻Mike Samuel <msamuel@google.com>
Cc: uri@w3.org
Message-Id: <B64DD001-14DC-42E3-ADC7-FA6EDBFF3732@gbiv.com>

On Jun 27, 2014, at 8:51 AM, ☻Mike Samuel wrote:

> Apologies if this is not the right forum for RFC 3986 related questions.
> 
> 
> Dot ('.') is in the unreserved set, and 3986 says
> 
> """
>    URIs that differ in the replacement of an unreserved character with
>    its corresponding percent-encoded US-ASCII octet are equivalent: they
>    identify the same resource.
> """
> 
> which leads me to believe that %2E which encodes dot should be
> normalized before interpreting "." and ".." path elements when doing
> path resolution.

It can be normalized, yes.  It could also be rejected for security reasons,
or simply processed as is if the resource wants to do so.

> If so, then resolving
>  Base URI:  /x/y/z/
> against
>  Relative URI: .%2E
> should yield
>  /x/y/
> and not
>  /x/y/z/.%2E
> 
> The existing libraries that I tested (Java's java.net.URI, Python's
> urlparse.urljoin) yield /x/y/z/.%2E and Java's normalize() method does
> not recognize the last path element as special.
> 
> 
> Browser's seem to differ.  Chrome and Safari seem to normalize ".%2E" early.
> Firefox seems to be leaving it up to the protocol handler.
> "https://www.google.com/webhp/.%2E" beGETs "www.google.com/."
> "file:///Users/msamuel/work/.%2E" fetches the right resource but ".."
> shows up as a path element in the URL bar.
> "http://urlecho.appspot.com/echo/z/.%2E" beGETs "urlecho.appspot.com/echo/z/.."
> 
> 
> Should resolution/normalization treat the path element ".%2E" as special?

This depends on when the %2E is processed.  Usually, references
are normalized after resolution to absolute form because the scheme
impacts normalization.  Since normalization is optional, various implementations
will differ regarding to when it is done (if at all).  Likewise, ".." is
only special during the relative->absolute conversion, so normalizing the
%2E after relative parsing is going to result in a ".." segment.

What the spec says is that ".%2E" and ".." are equivalent, meaning that
a server is likely to decode it to ".." and either reject the request for
security reasons or redirect it to a URI without the corresponding "/parent/..".
A redirect is necessary to avoid security bypass on the server path.

What the same thing means for locally processed file URIs is currently
not standardized due to lack of consensus among user agents, though
I would expect a browser to do the same processing as a server.

Note that you have to be careful in testing to see whether the browser
is normalizing the URI before the request or if it is being normalized
and redirected by the server after the (initial) request.

....Roy

Received on Friday, 27 June 2014 16:55:48 UTC