W3C home > Mailing lists > Public > uri@w3.org > October 2007

. and .. and ////

From: Manger, James H <James.H.Manger@team.telstra.com>
Date: Mon, 29 Oct 2007 14:59:58 +1100
Message-ID: <6215401E01247448A306C54F499111F203664581@WSMSG2103V.srv.dir.telstra.com>
To: <uri@w3.org>
How should variable values of “.” and “..” be handled?

My suggestion:
It is an error for a variable value or be “.” or “..” when the URI up to the position where the value is to be inserted matches the regular expression “|[^?]*/\.?”. That is, the URI:
* is empty; or
* ends with “/” or “/.” and does not contain a “?”.
Note: it is each variable value that is checked, not the entire replacement for a {…} segment.

The potential problem is that “.” and “..” have special meaning in URIs, which the template designer (web server) probably does not want.

Consider /car/{make}/{model}/prices.html.

If make=“..” and model=“truck” the URI is /car/../truck/prices.html, which is normalized to /truck/prices.html.  This is an unexpected result given the template.
If make=“ford” and model=“.” The URI is /car/ford/./prices.html, which is normalized to /car/ford/prices.html. Again, not quite an obvious result given the template.

A “.” may not be much of an issue in practice as // is generally treated as /, as is /./ of course. Consequently, “.” would be similar in practice to “” (an empty string).
I did some quick tests of URLs with consecutive ////’s. Neither Firefox nor IE browsers normalized the /’s before sending the request, but both Apache and Tomcat servers treated it just like a single / (eg return the same resource). The java.net.URI.normalize() method actually replaces //// with / (but that is a bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4723726).
http://bitworking.org//news////258/////The-end-of-the-AtomPub-WG works, for example.
I did find one web server (Pebble blog software, a Java web app) that treated //// differently than /. It returned 404 Not Found for //// (it has its own application-specific URL mapping layer).

A server COULD treat /car/ford//prices.html and /car/ford/prices.html differently, but some (most?) will not by default.

“.” and “..” are NOT special everywhere in a URI. For instance, /car/prices.html?make=..&model=truck is unchanged by normalization. Similarly, /car/{make}_{model}/prices.html -> /car/.._truck/prices.html is unchanged by normalization. Hence, a blanket prohibition on “.” and “..” as variable values would be unnecessarily restrictive.

%-encoding the dots does not help. Enter http://www.w3.org/%2E/ into the Firefox address bar, for instance, and it normalizes away the ./ before sending the request.

An implementation complication…
A URI will invariably be built from a template from left to right. To know if a “..” substitution will have a special meaning you have to know what comes next. For instance, with make=“..”, /car/{make}_{model}/ is not a problem but you have to notice the underscore to realise this. With /car/{make}{model}/ you cannot tell if make=“..” is a problem until {model} has been substituted.

Whatever rules we come up with, they should NOT require implementations to look ahead (certainly not at future substitutions) -- even if this means rejecting a URI such as /car/.._truck/.

My template proposal {^ prefix^ var []sep |default} supported 2 encoding modes: %-encode all chars not in <unreserved> (when there is no leading ^); or %-encode only chars not in <unreserved> and not in <reserved> (when leading ^ is present). Not %-encoding /’s makes it harder to detect (and, hence, treat as an error) “..” paths that affect the current URI. Eg, {^foo[]} when foo=“a/../../../” or foo=[“a/.”, “./.”, “./.”]. Perhaps /’s in variable values should always be encoded.
Received on Monday, 29 October 2007 04:00:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:37 GMT