more 'file' suggestions for draft-hoffman-file-uri from Mike Brown on 2004-09-21 (uri@w3.org from September 2004)

From: Mike Brown <mike@skew.org>
Date: Mon, 20 Sep 2004 22:28:20 -0600
To: Paul Hoffman / VPNC <paul.hoffman@vpnc.org>
Cc: uri@w3.org
Message-id: <414FADE4.8060700@skew.org>
Paul Hoffman / VPNC wrote:
> Please review the file URI draft and let me know if this is sufficient. 

I have a few suggestions that I'm sure I'll be sorry I posted.

First, an easy one. In section 2, change all "URL scheme" to "URI scheme".

Then, please give this extremely vague statement further consideration:

   "The file URL scheme is used to designate files accessible on a
    particular host computer."

Honestly, in this age of distributed filesystems, OS UIs that don't 
distinguish between local and remote resources, and the like, I don't 
even know for sure what a "file" is anymore, or exactly what kinds of 
access you have in mind when you say "accessible". Lots of finite bit 
sequences are accessible on a computer by a variety of means. I think at 
best we can only say the file is "associated" with the host. As for what 
makes a particular entity a "file", I don't know exactly. A file may or 
may not exist on a particular physical storage medium such as a disk or 
tape, and it may or may not be accessed via a network, which makes even 
the "on a particular host computer" open to boundless interpretation.

So how about we acknowledge the ambiguities with something like this...

   The file URI scheme is used to identify a "file" resource associated
   with a particular host computer.

   The scheme emerged when the term file was relatively well-understood
   as implying certain typical characteristics of a resource, such as
   being a finite bit sequence manipulated as a unit, stored on a
   relatively non-volatile storage device, organized with other files in
   a hierarchical or record-based "file system", and being "local" to a
   single physical computer by virtue of being stored on the computer's
   closely-attached physical storage devices and accessible primarily
   via the ordinary means native to file management on that host.

   The infusion of networking technologies into nearly every aspect of
   computing has since rendered such distinctions less and less relevant,
   so the file URI scheme likewise makes no attempt to imply a particular
   access mechanism or any other characteristics of the identified
   resource, aside from the fact that the resource is associated with a
   particular host; implementations of this scheme typically define their
   own concept of "file" in a manner that is appropriate for their
   platform.

And then continue on with the paragraph about poor interoperability.

Typos to fix in the 2nd paragraph: syntaxt, docoument

In the first paragraph, I think this can be safely deleted:

    This scheme, unlike most other URL schemes, does not designate a
    resource that is universally accessible over the Internet.

...the reason being, once you swap URL with URI, one must ask if 
"universal accessibility over the Internet" is really implied by most 
other resource identification schemes. Any references to "access" should 
be scrutinized, now that we distinguish between resource identification 
and resource representation retrieval. The thought that went into 
rfc2396bis's careful avoidance of requiring specific semantics in the 
authority component should be applied here (e.g. from the point of view 
of the syntax, a hostname doesn't *have* to be DNS based nor does it 
even necessarily rely on the idea that there's a network involved).

And maybe change this...

    A file URL takes the form:

    file://<host>/<path>

    where <host> is the fully qualified domain name of the system on
    which the <path> is accessible, and <path> is a hierarchical
    directory path of the form <directory>/<directory>/.../<name>.

...to something like this?

    Any URI having a scheme component consisting of "file", case-
    insensitively, is a file URI.

    A file URI usually takes the form:

    file://<host><abs-path>

    where <host> matches the host syntax rule from the rfc2396bis
    grammar and is usually either empty, "localhost", or a fully
    qualified domain name for the host to which <path> applies;
    and where <abs-path> matches the either the path-abempty or
    path-absolute syntax rule from the rfc2396bis grammar and
    whose nonterminal segments represent "directories" or "folders"
    in a hierarchical file system. A file URI is not restricted to
    this syntax or interpretation, however.

I'd also want to go one further and make this important statement, even 
if it is redundant by virtue of the fact that what's not in the spec 
doesn't need to be pointed out as not being in the spec:

    This standard does not mandate any particular mapping between
    the components of a file URI and the file itself, nor any means
    of accessing the file.

The consequences of this could then be discussed:

    Consequently, no single component of a URI alone is necessarily
    an unambiguous identifier of anything; the host in a file URI
    may or may not directly correlate to the actual host associated
    with the file, and the path in a file URI may or may not directly
    correlate to a file system's mechanisms for file identification,
    be it a file system path, inode number, or other.

    Of course, it is customary for there to be no surprises; the
    host component usually identifies the actual host, and the path
    component usually bears some resemblance to the file's path on
    that host's hierarchical file system.

    Accordingly, producers and consumers of file URIs should
    document their expectations and what explicit mappings they
    assume between file URI components and file system-specific
    identifiers. Implementations that use file URIs for resource
    representation retrieval should document what access mechanisms
    are supported.

I think I'll stop there. :)

-Mike
Received on Tuesday, 21 September 2004 04:28:20 UTC