file: URIs without host

I have been trying to rationalize the treatment of URIs such as 
file:foo/bar or file:/absolute/path within Jena.

To date, Jena has shown a schizophrenic behaviour:

In the RDF/XML side either such URI is treated as an absolute URI (see 
below for details)

In the N3 side, the URI is normalized on input to be a legal file:/// 
URI, using the current working directory to resolve relative paths.

====

Our understanding of the spec is:
a) it is long overdue for a rewrite (RFC 1738 being the defining doc)
b) forms file:foo/bar or file:/absolute/path are not legal by RFC 1738
c) These forms do fit the pattern specified in RFC 3986 (and it's 
predecessors) under these words:
[[
5.2.2.  Transform References

   [...]
       -- A non-strict parser may ignore a scheme in the reference
       -- if it is identical to the base URI's scheme.
       --
       if ((not strict) and (R.scheme == Base.scheme)) then
          undefine(R.scheme);
       endif;
]]

====

Jena's N3 side:

The argument (c) is the one used for turning file:foo into 
file:///current/working/directory/foo as long as we can argue that 
file:///current/working/directory is the base (e.g. as the application 
default URI).


====

Jena's RDF/XML side:

However, a use case for such file URIs would be that we wish to create a 
zip file including a complete application, which includes references to 
data in the zip. When we unzip we want the references to work, 
independent of both the machine or the directory.

This use case could be addressed by updating the defn of file: to 
include something like:

The GET operation on file URIs is performed with two pieces of 
contextual information:

- the machine on which the GET is being performed
- the current working directory in which the GET is being performed (may 
be undefined)

And then to define appropriate behaviour with the various sorts of file 
URIs.

This seems to chime better with some established usage patterns, such as 
the way that Java treats these URIs.

====

Any thoughts?
Which way should we rationalize behaviour?
Should we be working on an I-D for file: ?

Jeremy

Received on Thursday, 4 October 2007 13:34:43 UTC