RE: What to do about file:

Here's some text which might appear in a description of the
'file:' URI scheme. It's missing most of the important and
interesting details; think of it as a proposal for organizing the
material.  References to particular implementations are given
by citing the implementation (or its documentation, if available),
but the RFC explains the range of behavior for the areas
where there are differences.

What do you think of this approach?

=====================================================

- Hierarchical structure

Most implementations of the 'file:' URI scheme do a reasonable
job of mapping the hierarchical part of a directory structure
into the '/' delimited hierarchy of the URI syntax, independent
of what the 'native' platform delimiter is.

For example, on Windows platforms, it is typical that the file
system presents backslash '\' as the file delimeter for file
names, yet the URI's forward slash '/' can be used in file: URIs.

Similarly, on (some) Macintosh OS versions, at least in some
contexts, the colon (':') is used as the delimiter in the native
presentation of file path names.

Unix systems natively use the same forward slash '/' delimiter
for hierarchy, so there is a closer mapping between file paths
and native path names.


- 'Drives, drive letters, mount points, file system root'

There is considerable difference, in practice, for handling
of the syntax for the 'top' of the hierarchy.  The 'file:'
URI syntax provides on simple place for designating the
root of the file hierachy, and implementations have diverged,
even on the same platform, sometimes even within a single
application.

For example, DOS and Windows based systems support the
notion of a "drive letter", a single character which
represents a (virtual) drive, mount point, or device.
Native representations of file paths start with the drive letter,
a colon, and then the path; e.g., "c:\tmp\test.txt".

Drive letters can be mapped into the top of a 'file:' URI in various
ways; some applications substitute horizontal bar "|" for
the ":" after the drive letter, yielding   file:///c|/tmp/test.txt.
In some cases, the : is left unchanged [a][b][d]; some applications
omit it [f].

UNC path names....


Use of hostname, host name checking

The 'file:' URI specification calls for using the actual
host name as the name authority   file://myhostname/path
and allowing it to be ommitted. This practice is rarely
followed, and frequently is not checked.


Omitting authority

Some applications generate URIs with no authority component
at all, e.g., file:/this/is/the/path  [x][y][z]

Using native paths

Some applications accept (and even generate) File URIs
which use the native syntax instead of the canonical
/-delimited one. [p][d][q].

Character sets and encodings

Local file systems of course may use many different encodings
for representing file names. For interoperability sake, it would
be preferable for file: URI libraries to translate the native
character encoding for file names to and from Unicode, using
URI / 

References

[lwp-perl]  LWP perl library
[java-net]  Java.net.URI
[ms-net-lib]  Microsoft .NET library

Received on Friday, 20 August 2004 06:13:35 UTC