File URIs, home and relative paths

Dear List,

The use of HTTP and other recognized URL schemes for resource
identification often overlaps with the use of files. For example, in the
specification of Python dependencies.

It is natural to desire to use URLs for all dependencies, even file
dependencies; but there are two areas where there are difficulties in
principle as well as in practice:

   -

   In the specification of vendored dependencies
   -

   In the specification of internally hosted dependencies

The former require a notion of relative paths; the latter, of home
directories.

RFC 1736 admonishes us (4.4) that “Locators can be readily distinguished
from naming and descriptive identifiers that may occupy the same name
space.”. The use of file: instead of file:// for local file URIs would seem
to compromise this requirement, since file:... is a valid file path,
whereas anything with // in it is not a file path (double slashes being
meaningless in paths).

RFC 1736 says (4.2) that “Locators have global scope.” and goes on to say:
“The probability of successful access using an Internet locator depends in
no way, modulo resource availability, on the geographical or Internet
location of the client.” In a similar vain, RFC 3986 says (1.1) that “URIs
have a global scope and are interpreted consistently regardless of context,
though the result of that interpretation may be in relation to the
end-user’s context.”.

It is the first principle that motivates encoding home and dot in file URLs
with a ://: that there should be a distinctive way, analogous with other
URLs, to specify file resources. The second principle is what guides us in
considering (a) whether such URLs are allowable and (b) how they should be
understood.

The interpretation of file://server/code/main must be: /code/main on the
server server. But in practice one is apt to want: ~/code/main on the
server server, due both to shared hosting realities and privilege
separation. Every user has a home (on Windows, Linux, Mac, Android, iOS…),
so ~ can always be interpreted, although it is clear that “…the result of
that interpretation may be in relation to the end-user’s context.”. The
interpretation of ~ in the local URL file:///~ is similar to any
interpretation for remote servers. The use of local file URLs with ~ allows
us to identify and recover configuration (for example). This seems like an
easy addition to any follow on of RFC 1738: to adopt ~ to mean home would
not impose alien concepts on implementations or conflict with existing URI
concepts.

With regards to . and ..: the generic URI syntax describes how . and .. are
to be interpreted and offers an algorithm. RFC 3986 goes on to say (5.2)
that “Applications may implement relative reference resolution by using
some other algorithm, provided that the results match what would be given
by this one.” This algorithm results in an interpretation that can not be
bent to the purpose of specifying local file references. The URL
file:///./a/b must be treated identically to file:///a/b and that is just
/a/b/. I am not sure what to do about this one. Even URI templates would
take what is formerly succinct and natural — . — and transform it to
something both strange to the eye and apt to confuse the shell, Mustache
templates, &c.

The principle that “Locators have global scope.” would seem to suggest we
should think no more about . and ..; however, vendored dependencies would
in practice have a meaningful interpretation globally since they are
shipped in the archive. Local directory resolution would seem to be
compatible with the spirit of URIs in the same way as localhost (or indeed
DNS names in general).

That neither of these use cases — very common in practice — is provided for
by the file:// scheme has been a source of amusement and frustration
throughout my many years of writing tools for automated deployment and
system management. It is my sincere hope that the file URI scheme evolves a
mechanism to handle these cases while retaining syntactic uniformity. To
implement tools that ignore or extend the specification is, while not
unheard of, not really right. It is best to find and stick to a standard if
it is at all serviceable.

Kind Regards,

Jason Dusek
​

Received on Tuesday, 7 June 2016 23:40:20 UTC