- From: Mike Brown <mike@skew.org>
- Date: Wed, 22 Sep 2004 14:50:27 -0600
- To: Larry Masinter <LMM@acm.org>
- Cc: uri@w3.org
Larry Masinter wrote: >>- The syntax of a file URI is that of absolute-URI, except that >> its scheme component must be 'file', case-insensitively. >> >> > >I think this just confuses things, since it doesn't really say >anything. > > OK, but I think a statement of lexical syntax, independent of semantics, should be made. What the syntax is doesn't bother me much, although I am under the impression that rfc2396bis requires all URI schemes to acknowledge that the resource identifier may contain components that are not meaningful to a particular scheme. If we make a statement like what is currently in the standard -- that a file URI's syntax is file://<host>/<path>, then that implies certain restrictions. For example... FILE://what@the:/heck?is;this - file URI or no? I say it is, on the grounds that it has the file scheme and matches absolute-URI. Sure, it contains a bunch of other junk, but that doesn't preclude it from being useful as an identifier of a file resource. Tack on a fragment, though... FILE://what@the:/heck?is;this#eh? Is it still a file URI? This is a nuance of the rfc2396bis grammar that I'm unsure about. What is lexically a "URI" can have a fragment, but sec. 4.3 says "Some protocol elements allow only the absolute form...". Is a scheme definition a 'protocol element'? Or does a scheme have to acknowledge that a fragment may be present? If the latter, then strike what I said about absolute-URI; it should just match the URI syntax rule and thus a statement of such would be redundant (though what harm is there in having it?). >>- The *typical* syntax of a file URI is more restrictive >> (no query component, authority is usually empty, >> path usually starts with "/") >> >> > >Well, hmmm, the 'authority' is interpreted differently, but >it is either empty, 'localhost', or some other value; in >some implementations, it is a host name in some local name >space. (For example, many Windows implementations treats >the 'authority' component as a UNC host, e.g., > file://hostname/path/to/file => \\hostname\path\to\file > > Actually I'd rather not make a statement about the "usual" syntax, per se. Information about commonly used values is better provided in a separate section that states what each component of a file URI represents and how it is usually (or should be) interpreted. >>- The authority component of a file URI is considered by this >> specification to contain a host component exactly as defined >> by the rfc2396bis grammar. (I don't want there to be any >> ambiguity about what the "host component" is). >> >> > >Well, is it ever anything other than empty, 'localhost' or >a host name? > No, I mean that I don't want there to be any confusion as to whether "host component" means everything that comes between the 2nd and 3rd slash, as is implied by the current spec. That's actually the authority. Once we clarify what piece of the authority is the "host", we can make a statement about what it represents -- the host associated with the file the URI identifies -- and about what common & special values it has -- empty, 'localhost', or a URI-friendly value that is derived from the host's name. >>- The path component of a file URI represents an identifier >>for the file >> as would be used in the host's principal file system interface >> (i.e., the path component of a file URI usually represents a file's >> "local path" on the host's file system). "File system interface" is >> assumed to be a well-understood concept. >> >> > > >Actually, I disagree. What it *should* be is a translation of >the local file system's path to a file, in the local character >encoding for the file system, into (hex-encoded) UTF-8, where >"/" is used consistently for directory delimiters, and with >an appropriate platform-specific encoding for other top-level >decorations of the file syntax. > Your statement and mine are not in conflict. Mine is a statement of what information is *conceptually* represented by a certain literal piece of the URI, and yours is a statement that implicitly relies on mine: once it is assumed that this piece of the URI represents a file system path (which I describe in less presumptive terms, since "path" is generally only meaningful in file systems that use hierarchical identification conventions), then you can provide the details of how to derive, from that file system path, a URI-safe value to be used as the path component of a file URI. >>- Other components of a file URI, if defined, are not defined >> by this standard as necessarily representing anything in particular, >> but they do contribute to the identification of the file represented >> by the URI. Thus, a query component present in a file URI may or may >> not affect how the URI is dereferenced on a particular platform, >> but even when it does not affect anything, it cannot be assumed, >> in the absence of a standard stating otherwise, that a file URI with >> a query component is equivalent to a file URI without one. >> >> > >I think this is useless. Let's describe what usually works. > >I think the query component should be ignored when dereferencing the >resource, but dynamic content of a file may be able to access >"the URI used to reference it", and take advantage of the >query component in that way. > I don't feel it is useless to try to address these issues, but I concede that the way I phrased it is far from ideal. As an implementer I'd like to know whether any extra junk in the file URI is to be completely ignored. Can I assume that file://junk@myhost/a/b/c?morejunk is equivalent to file://myhost/a/b/c as an identifier of file /a/b/c on host myhost? Am I allowed to dereference the two URIs in different ways based on the presence/absence of the junk, so long as I get a representation of the same file? I can't think of a good reason why not. >>- The manner in which a host component represents a host is >> this: If the component is empty or is "localhost" (what if it is >> the percent-encoded equivalent of "localhost"?), the component >> represents the host on which the URI is being interpreted. No >> guidelines are given for the interpretation of any other values; >> they may take the form of IP addresses, DNS names, or any other >> identifier. No guidelines are given for how to dereference such >> identifiers (hey, I'm just describing current practice). >> >> > >I see no point in giving 'no guidelines', and you're not >actually 'describing current practice', you're trying to >avoid describing it by disclaiming any knowledge of current >practice. > I feel that all we can say about the host part of the URI is that 1. regardless of its value, it is a representation of the host 2. it may have special values 'localhost' or empty string in order to represent the host on which the URI is interpreted 3. any other value is a representation (%-encoded etc.) of the name of the host. The nature of the name being represented and how to dereference it are beyond the scope of the standard. This is the same position taken by rfc2396bis. At most, we can say that it is typical to use DNS based lookups etc. and that it is inadvisable for there to be any surprises in this regard, but we shouldn't require any particular host-locating mechanism as a "must". Further, now that rfc2396bis lets us %-encode the authority, we have to figure out what to do with file://local%68%6F%73%74/foo. I am 80% sure that we are required to make no distinction between that and file://localhost/foo, but a literal interpretation of the current standard would fail to require treatment of the %-encoded version as representing the host on which the URI is interpreted. (An implementer is free to do so anyway, but they're also free to do a DNS lookup of the percent-decoded version). So a decision should be made. Personally I think the 'localhost' constraint is garbage and should never be enforced; if I want to define 'localhost' to be something other than 127.0.0.1 on my system, that's my prerogative, and I expect file://localhost/foo to resolve accordingly. -Mike
Received on Wednesday, 22 September 2004 20:50:29 UTC