Re: more 'file' suggestions for draft-hoffman-file-uri from Mike Brown on 2004-09-21 (uri@w3.org from September 2004)

From: Mike Brown <mike@skew.org>
Date: Tue, 21 Sep 2004 13:27:54 -0600
To: Larry Masinter <LMM@acm.org>
Cc: "'Paul Hoffman / VPNC'" <paul.hoffman@vpnc.org>, uri@w3.org
Message-id: <415080BA.9030008@skew.org>
Larry Masinter wrote:

>>   The scheme emerged when the term file was relatively well-understood
>>   as implying certain typical characteristics of a resource, such as
>>   being a finite bit sequence manipulated as a unit, stored on a
>>   relatively non-volatile storage device, organized with other files in
>>   a hierarchical or record-based "file system"
>>    
>>
>
>This isn't so. At the time the 'file:' scheme was invented, there many
>widely used distributed file systems, and we were all
>aware of them -- AFS and NFS, and earlier systems,
>that mapped file name syntax into remote access protocols.
>

I know that network file systems have been around for a long time. The 
text you quoted does not seem to be contradicted by anything you're 
saying here.

The question I am trying to come up with a better answer to is "What 
class of resources do file URIs identify?". There are, presumably, 
reasons we believe a file URI identifies some resources and not others. 
What is the difference? Can it be clearly expressed in this standard? If 
so, express it. If not, say so.

Is the differentiating factor the traditional dereference mechanism for 
such URIs? It seems to be so, inasmuch as we tend to believe that 
dereferencing a file URI will not involve FTP or HTTP, but may involve 
NFS or SMB. Even when Roy characterizes the difference as having to do 
with a "file system interface", that's an access mechanism.

For some reason, I believe that an FTP client & server do not constitute 
a file system interface, while Windows Explorer and an SMB server do, 
but I don't know why that is, exactly. It seems to be predicated on the 
notion that there is a traditional manner of identifying and accessing 
locally-stored resources, and that if any network layers have been 
interjected, this has been done in a manner that, on the whole, makes 
remotely-stored resources be accessible in roughly the same way as the 
local ones (e.g. the use of mount points on a Unix filesystem or the use 
of share names in UNC paths, as opposed to a separate UI like an FTP or 
HTTP client application).

If these things are factors in determining what is / should be 
identified by a file URI, then I think it would be beneficial to state 
them.  (But again, I'm not sure exactly why, other than that I just 
don't like how ambiguous "file" is and always has been.


>>In the first paragraph, I think this can be safely deleted:
>>
>>    This scheme, unlike most other URL schemes, does not designate a
>>    resource that is universally accessible over the Internet.
>>    
>>
>
>I think this is a crucial idea, although perhaps this sentence doesn't
>capture it. But it's not a good idea to delete it, it's important to
>fix it. "file:" lacks an important property of most other URI schemes.
>I think the thing missing is the uniformity of identification rather
>than "universal access", but that can be fixed.
>  
>
>>...the reason being, once you swap URL with URI, one must ask if 
>>"universal accessibility over the Internet" is really implied by most 
>>other resource identification schemes.
>>    
>>
>
>Most resource identification schemes for URIs have -- and should have --
>uniform meaning, not dependent on local context.
>  
>

I agree. A Uniform Resource Identifier ought to uniformly identify 
resources.


>>    Any URI having a scheme component consisting of "file", case-
>>    insensitively, is a file URI.
>>    
>>
>
>No, certainly not. I think we should preserve and encourage
>uniformity of meaning for 'file:'.
>
Are you saying that it is not enough for a URI to syntactically be a 
file URI in order to be processed or at least referred to in this 
standard, it must identify only a resource of only a certain kind? I'm 
fine with that, but I must re-ask the first question, then. What 
qualifies a resource as being identifiable by a file URI?

>>    This standard does not mandate any particular mapping between
>>    the components of a file URI and the file itself, nor any means
>>    of accessing the file.
>>    
>>
>
>I think this is a bad direction, and that we should try to
>narrow and standardize the interpretation of file: URIs,
>because it is a useful concept for which there is a basis
>for moving forward and making things more uniform. Disclaiming
>responsibility goes in the wrong direction.
>
I don't think it is going in any direction, it is just stating the 
obvious. I agree that we should try to narrow and standardize, but there 
seems to be reluctance to make any changes that would conflict with 
current practice, as no matter what recommendation you make, it's going 
to invalidate someone's implementation. Also, any attempt to narrow 
interpretations and definitions is met with great debate, which is 
discouraging.

What I hoped to at least achieve was the tiny step forward of at least 
acknowledging that the components of a file URI could (and should?) be 
interpreted by a dereferencing mechanism as mapping to, for example, a 
resolvable hostname and the directories/folders and filename in a 
hierarchical file system. This was ever-so-slightly implicit in the 
example that used <directory>/<directory> but since I recommended 
replacing that with rfc2396bis syntax references, it seems like the 
implication shoudl instead be explicitly stated somehow. Then we can 
move forward from there and start describing what mappings are common, 
and then maybe even make some recommendations. Tiny steps, though, hence 
the way I phrased it above (which is intentionally not making any 
assumptions about how files are named within a file system).
Received on Tuesday, 21 September 2004 19:28:03 UTC