# Re: more 'file' suggestions for draft-hoffman-file-uri

From: Mike Brown <mike@skew.org>
Date: Wed, 22 Sep 2004 14:50:27 -0600
To: Larry Masinter <LMM@acm.org>
Cc: uri@w3.org
Message-id: <4151E593.3010002@skew.org>

Larry Masinter wrote:

>>- The syntax of a file URI is that of absolute-URI, except that
>>  its scheme component must be 'file', case-insensitively.
>>
>>
>
>I think this just confuses things, since it doesn't really say
>anything.
>
>
OK, but I think a statement of lexical syntax, independent of semantics,
should be made. What the syntax is doesn't bother me much, although I am
under the impression that rfc2396bis requires all URI schemes to
acknowledge that the resource identifier may contain components that are
not meaningful to a particular scheme. If we make a statement like what
is currently in the standard -- that a file URI's syntax is
file://<host>/<path>, then that implies certain restrictions. For example...

FILE://what@the:/heck?is;this  - file URI or no?

I say it is, on the grounds that it has the file scheme and matches
absolute-URI. Sure, it contains a bunch of other junk, but that doesn't
preclude it from being useful as an identifier of a file resource.

Tack on a fragment, though...

FILE://what@the:/heck?is;this#eh?

Is it still a file URI? This is a nuance of the rfc2396bis grammar that
I'm unsure about. What is lexically a "URI" can have a fragment, but
sec. 4.3 says "Some protocol elements allow only the absolute form...".
Is a scheme definition a 'protocol element'? Or does a scheme have to
acknowledge that a fragment may be present? If the latter, then strike
what I said about absolute-URI; it should just match the URI syntax rule
and thus a statement of such would be redundant (though what harm is
there in having it?).

>>- The *typical* syntax of a file URI is more restrictive
>>  (no query component, authority is usually empty,
>>   path usually starts with "/")
>>
>>
>
>Well, hmmm, the 'authority' is interpreted differently, but
>it is either empty, 'localhost', or some other value; in
>some implementations, it is a host name in some local name
>space. (For example, many Windows implementations treats
>the 'authority' component as a UNC host, e.g.,
> file://hostname/path/to/file  =>  \\hostname\path\to\file
>
>
Actually I'd rather not make a statement about the "usual" syntax, per
se. Information about commonly used values is better provided in a
separate section that states what each component of a file URI
represents and how it is usually (or should be) interpreted.

>>- The authority component of a file URI is considered by this
>>  specification to contain a host component exactly as defined
>>  by the rfc2396bis grammar. (I don't want there to be any
>>  ambiguity about what the "host component" is).
>>
>>
>
>Well, is it ever anything other than empty, 'localhost' or
>a host name?
>
No, I mean that I don't want there to be any confusion as to whether
"host component" means everything that comes between the 2nd and 3rd
slash, as is implied by the current spec. That's actually the authority.
Once we clarify what piece of the authority is the "host", we can make a
statement about what it represents -- the host associated with the file
the URI identifies -- and about what common & special values it has --
empty, 'localhost', or a URI-friendly value that is derived from the
host's name.

>>- The path component of a file URI represents an identifier
>>for the file
>>  as would be used in the host's principal file system interface
>>  (i.e., the path component of a file URI usually represents a file's
>>  "local path" on the host's file system). "File system interface" is
>>  assumed to be a well-understood concept.
>>
>>
>
>
>Actually, I disagree. What it *should* be is a translation of
>the local file system's path to a file, in the local character
>encoding for the file system, into (hex-encoded) UTF-8, where
>"/" is used consistently for directory delimiters, and with
>an appropriate platform-specific encoding for other top-level
>decorations of the file syntax.
>
Your statement and mine are not in conflict. Mine is a statement of what
information is *conceptually* represented by a certain literal piece of
the URI, and yours is a statement that implicitly relies on mine: once
it is assumed that this piece of the URI represents a file system path
(which I describe in less presumptive terms, since "path" is generally
only meaningful in file systems that use hierarchical identification
conventions), then you can provide the details of how to derive, from
that file system path, a URI-safe value to be used as the path component
of a file URI.

>>- Other components of a file URI, if defined, are not defined
>>  by this standard as necessarily representing anything in particular,
>>  but they do contribute to the identification of the file represented
>>  by the URI. Thus, a query component present in a file URI may or may
>>  not affect how the URI is dereferenced on a particular platform,
>>  but even when it does not affect anything, it cannot be assumed,
>>  in the absence of a standard stating otherwise, that a file URI with
>>  a query component is equivalent to a file URI without one.
>>
>>
>
>I think this is useless. Let's describe what usually works.
>
>I think the query component should be ignored when dereferencing the
>resource, but dynamic content of a file may be able to access
>"the URI used to reference it", and take advantage of the
>query component in that way.
>
I don't feel it is useless to try to address these issues, but I concede
that the way I phrased it is far from ideal.

As an implementer I'd like to know whether any extra junk in the file
URI is to be completely ignored. Can I assume that
file://junk@myhost/a/b/c?morejunk is equivalent to file://myhost/a/b/c
as an identifier of file /a/b/c on host myhost? Am I allowed to
dereference the two URIs in different ways based on the presence/absence
of the junk, so long as I get a representation of the same file? I can't
think of a good reason why not.

>>- The manner in which a host component represents a host is
>>  this: If the component is empty or is "localhost" (what if it is
>>  the percent-encoded equivalent of "localhost"?), the component
>>  represents the host on which the URI is being interpreted. No
>>  guidelines are given for the interpretation of any other values;
>>  they may take the form of IP addresses, DNS names, or any other
>>  identifier. No guidelines are given for how to dereference such
>>  identifiers (hey, I'm just describing current practice).
>>
>>
>
>I see no point in giving 'no guidelines', and you're not
>actually 'describing current practice', you're trying to
>avoid describing it by disclaiming any knowledge of current
>practice.
>

I feel that all we can say about the host part of the URI is that

1. regardless of its value, it is a representation of the host

2. it may have special values 'localhost' or empty string in order to
represent the host on which the URI is interpreted

3. any other value is a representation (%-encoded etc.) of the name of
the host. The nature of the name being represented and how to
dereference it are beyond the scope of the standard. This is the same
position taken by rfc2396bis. At most, we can say that it is typical to
use DNS based lookups etc. and that it is inadvisable for there to be
any surprises in this regard, but we shouldn't require any particular
host-locating mechanism as a "must".

Further, now that rfc2396bis lets us %-encode the authority, we have to
figure out what to do with file://local%68%6F%73%74/foo. I am 80% sure
that we are required to make no distinction between that and
file://localhost/foo, but a literal interpretation of the current
standard would fail to require treatment of the %-encoded version as
representing the host on which the URI is interpreted. (An implementer
is free to do so anyway, but they're also free to do a DNS lookup of the
percent-decoded version). So a decision should be made. Personally I
think the 'localhost' constraint is garbage and should never be
enforced; if I want to define 'localhost' to be something other than
127.0.0.1 on my system, that's my prerogative, and I expect
file://localhost/foo to resolve accordingly.

-Mike

Received on Wednesday, 22 September 2004 20:50:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:34 GMT