W3C home > Mailing lists > Public > uri@w3.org > September 2004

Re: file: URI scheme

From: Mike Brown <mike@skew.org>
Date: Mon, 27 Sep 2004 13:54:26 -0600 (MDT)
Message-Id: <200409271954.i8RJsQit021455@chilled.skew.org>
To: Graham Klyne <gk@ninebynine.org>
CC: uri@w3.org

Graham Klyne wrote:
> For Unix/Linux
> --------------
> filename->URI:
> Assume full path from root is given. IRI is "file://" ++ the given local 
> system path, with URI-escaping applied as needed (including to any '?' and 
> '#' characters).

I assume the "IRI" there is intentional, since some systems have Unicode 
filename support. Thus it's really more like filename->IRI->URI? Might it be 
better to recommend converting the path to an IRI and then rather than "as 
needed", explicitly say that the IRI spec governs conversion of an IRI to a 
URI?

In any case, the APIs of some file systems represent filenames as Unicode, and 
others represent them as bytes with some default encoding (which may vary), so 
some recommendation should be given as to how to deal with encoded filenames. 
Should the bytes be percent-encoded directly, or should the default encoding 
be used to convert the bytes to characters first, and then use UTF-8 as the 
basis for percent-encoding? I prefer the latter, but I bet there are a ton of 
implementations doing the former. Should they be allowed to do that?

> URI->filename.:
> If authority is non-empty, its interpretation is system-dependent.  The URI 
> path component is un-escaped and used as the local system path.

I would add that since reserved characters can appear in a Unix filename, care 
should be taken to percent-decode each path segment separately. For example, a 
literal "/" or ";" in a path segment will be percent-encoded, so you don't 
want to blindly percent-decode and then think you've got a inter-segment 
separator or segment/param separator.  file:///a%2Fb -> filename "a/b" in the
root directory, not filename "b" in the "a" subdir of the root dir.

> 
> For MS-Windows
> --------------
> filename->URI:
> If filename starts with x:... (with 'x' a letter), a leading '/' is 
> added.  Convert all '\' characters to '/'.  Apply URI escaping as needed 
> (including to any '?' and '#' characters).  The URI is formed by append the 
> resulting string to file://
> 

If you're going to recommend interpretation of a URI as a UNC name on the
way back, might as well recommend how to produce a URI from a UNC name.

> URI->filename:
> If authority is non-empty, it may be interpreted as a UNC name,

For compatibility with some existing implementations, if the authority is 
empty and the path begins with 2 '/', then the first path segment may be
interpreted as a UNC name, e.g. file:////host/share/path.

Also, please double-check terminology here. I'm not sure "UNC name" is
appropriate. (maybe it is, but I think host + share + path are more
meaningful)

> if the path component starts with /x: (with 'x' a letter), strip off the 
> leading '/'.  The remaining URI path component is un-escaped and used as 
> the local system path (possibly appended to the UNC component).

UNC paths don't have drivespecs, e.g. you won't see //host/C:/somepath.
If you share the root of a drive, you have to pick a non-colon-containing
name for it. On my XP box it defaults to the drive letter alone, but I
have seen network drives named $c$ before, so I don't know what conventions
might be typical. Probably should not assume anything and just let the
implementations decide how to deal with any odd cases like
file://host/C:/path
file://nonlocalhost1//host2/share/path
etc.
Received on Monday, 27 September 2004 19:54:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:34 GMT