W3C home > Mailing lists > Public > uri@w3.org > September 2004

Re: file: URI scheme

From: Graham Klyne <GK@ninebynine.org>
Date: Wed, 29 Sep 2004 09:51:55 +0100
Message-Id: <>
To: Mike Brown <mike@skew.org>
Cc: uri@w3.org


Thanks for your feedback.

Two quick comments:

(1) My use of IRI was accidental.  I'm not opposed to it, I just wasn't 
trying to be so clever.

(2) The technical details offered were intended to be illustrative, not 
definitive, and I welcome your observations, clarifications and 
corrections.  Thanks!  Mainly, I was trying to clarify my structural 
suggestion by sketching what I felt was the level of information that might 
be included.  If that approach is felt to be helpful then of course we need 
thrash out the technical details.  I'll add that by opting for a 
non-normative appendix, I think it's reasonable to apply the 80/20 
principle and shoot for something concise rather than a complete of all 
possible problems.

Sorry, but out of time to be any detailed right now.


At 13:54 27/09/04 -0600, Mike Brown wrote:
>Graham Klyne wrote:
> > For Unix/Linux
> > --------------
> > filename->URI:
> > Assume full path from root is given. IRI is "file://" ++ the given local
> > system path, with URI-escaping applied as needed (including to any '?' and
> > '#' characters).
>I assume the "IRI" there is intentional, since some systems have Unicode
>filename support. Thus it's really more like filename->IRI->URI? Might it be
>better to recommend converting the path to an IRI and then rather than "as
>needed", explicitly say that the IRI spec governs conversion of an IRI to a
>In any case, the APIs of some file systems represent filenames as Unicode, 
>others represent them as bytes with some default encoding (which may 
>vary), so
>some recommendation should be given as to how to deal with encoded filenames.
>Should the bytes be percent-encoded directly, or should the default encoding
>be used to convert the bytes to characters first, and then use UTF-8 as the
>basis for percent-encoding? I prefer the latter, but I bet there are a ton of
>implementations doing the former. Should they be allowed to do that?
> > URI->filename.:
> > If authority is non-empty, its interpretation is system-dependent.  The 
> > path component is un-escaped and used as the local system path.
>I would add that since reserved characters can appear in a Unix filename, 
>should be taken to percent-decode each path segment separately. For 
>example, a
>literal "/" or ";" in a path segment will be percent-encoded, so you don't
>want to blindly percent-decode and then think you've got a inter-segment
>separator or segment/param separator.  file:///a%2Fb -> filename "a/b" in the
>root directory, not filename "b" in the "a" subdir of the root dir.
> >
> > For MS-Windows
> > --------------
> > filename->URI:
> > If filename starts with x:... (with 'x' a letter), a leading '/' is
> > added.  Convert all '\' characters to '/'.  Apply URI escaping as needed
> > (including to any '?' and '#' characters).  The URI is formed by append 
> the
> > resulting string to file://
> >
>If you're going to recommend interpretation of a URI as a UNC name on the
>way back, might as well recommend how to produce a URI from a UNC name.
> > URI->filename:
> > If authority is non-empty, it may be interpreted as a UNC name,
>For compatibility with some existing implementations, if the authority is
>empty and the path begins with 2 '/', then the first path segment may be
>interpreted as a UNC name, e.g. file:////host/share/path.
>Also, please double-check terminology here. I'm not sure "UNC name" is
>appropriate. (maybe it is, but I think host + share + path are more
> > if the path component starts with /x: (with 'x' a letter), strip off the
> > leading '/'.  The remaining URI path component is un-escaped and used as
> > the local system path (possibly appended to the UNC component).
>UNC paths don't have drivespecs, e.g. you won't see //host/C:/somepath.
>If you share the root of a drive, you have to pick a non-colon-containing
>name for it. On my XP box it defaults to the drive letter alone, but I
>have seen network drives named $c$ before, so I don't know what conventions
>might be typical. Probably should not assume anything and just let the
>implementations decide how to deal with any odd cases like

Graham Klyne
For email:
Received on Wednesday, 29 September 2004 09:15:04 UTC

This archive was generated by hypermail 2.4.0 : Sunday, 10 October 2021 22:17:46 UTC