- From: Mike Brown <mike@skew.org>
- Date: Mon, 27 Sep 2004 13:54:26 -0600 (MDT)
- To: Graham Klyne <gk@ninebynine.org>
- CC: uri@w3.org
Graham Klyne wrote: > For Unix/Linux > -------------- > filename->URI: > Assume full path from root is given. IRI is "file://" ++ the given local > system path, with URI-escaping applied as needed (including to any '?' and > '#' characters). I assume the "IRI" there is intentional, since some systems have Unicode filename support. Thus it's really more like filename->IRI->URI? Might it be better to recommend converting the path to an IRI and then rather than "as needed", explicitly say that the IRI spec governs conversion of an IRI to a URI? In any case, the APIs of some file systems represent filenames as Unicode, and others represent them as bytes with some default encoding (which may vary), so some recommendation should be given as to how to deal with encoded filenames. Should the bytes be percent-encoded directly, or should the default encoding be used to convert the bytes to characters first, and then use UTF-8 as the basis for percent-encoding? I prefer the latter, but I bet there are a ton of implementations doing the former. Should they be allowed to do that? > URI->filename.: > If authority is non-empty, its interpretation is system-dependent. The URI > path component is un-escaped and used as the local system path. I would add that since reserved characters can appear in a Unix filename, care should be taken to percent-decode each path segment separately. For example, a literal "/" or ";" in a path segment will be percent-encoded, so you don't want to blindly percent-decode and then think you've got a inter-segment separator or segment/param separator. file:///a%2Fb -> filename "a/b" in the root directory, not filename "b" in the "a" subdir of the root dir. > > For MS-Windows > -------------- > filename->URI: > If filename starts with x:... (with 'x' a letter), a leading '/' is > added. Convert all '\' characters to '/'. Apply URI escaping as needed > (including to any '?' and '#' characters). The URI is formed by append the > resulting string to file:// > If you're going to recommend interpretation of a URI as a UNC name on the way back, might as well recommend how to produce a URI from a UNC name. > URI->filename: > If authority is non-empty, it may be interpreted as a UNC name, For compatibility with some existing implementations, if the authority is empty and the path begins with 2 '/', then the first path segment may be interpreted as a UNC name, e.g. file:////host/share/path. Also, please double-check terminology here. I'm not sure "UNC name" is appropriate. (maybe it is, but I think host + share + path are more meaningful) > if the path component starts with /x: (with 'x' a letter), strip off the > leading '/'. The remaining URI path component is un-escaped and used as > the local system path (possibly appended to the UNC component). UNC paths don't have drivespecs, e.g. you won't see //host/C:/somepath. If you share the root of a drive, you have to pick a non-colon-containing name for it. On my XP box it defaults to the drive letter alone, but I have seen network drives named $c$ before, so I don't know what conventions might be typical. Probably should not assume anything and just let the implementations decide how to deal with any odd cases like file://host/C:/path file://nonlocalhost1//host2/share/path etc.
Received on Monday, 27 September 2004 19:54:26 UTC