file:yyy and file:ddd/yyy from Jeremy Carroll on 2006-02-04 (uri@w3.org from February 2006)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Sat, 04 Feb 2006 11:59:23 +0000
To: uri@w3.org
Message-ID: <43E4971B.1080505@hpl.hp.com>
Summary:
========

What is the correct reading of file:yyy and file:ddd/yyy?

Is it that these are relative URIs to be interpreted against the current 
working directory (as an absolute file: URI) of an application using 
them. Or is it better to treat them as absolute URIs?
Treating them as errors would seem to go against too much deployed practice.

(note, the remainder of this message is merely background to this 
question, FYGI, and not required to engage with the above issue).


Background
==========

In my team (Jena Semantic Web project), we are trying to improve our 
URI/IRI handling code. For years, we have used a variety of third party 
code that has always been problematic in the detail, and hard to 
support. We have also had a very tolerant contract, where we accept as a 
URI any string. This has been the cause of difficulty, when for 
instance, we accept a bad URI on input, but then can't output it again, 
because we can't tell how it interacts with say an xml:base declaration 
in a document, because it is too badly formed.

We are considering having a much stricter contract (optionally; default 
behaviour [strict/lax] to be decided). Particularly, since in Semantic 
Web, URIs are primarily treated as identifiers, rather than operational 
instructions, strictness seems more appropriate. (e.g. a mistake in a 
URI that is an identifier and an instruction is often detected when you 
do a GET; in a browsing context, these errors are detected very quickly, 
because GETs are done soon. In a SemWeb context, the first GET might not 
be applied for months, and the URI might have been through many systems.)

file: URIs are particularly problematic.
Our command-line tools accept file: URIs as URIs (typically ones which 
locate documents to process). In particular, file:foo.rdf is used to 
locate a file in the current directory. We want to continue supporting 
this behaviour; but it seems hard to account for it with the RFCs 
defining URIs and the file: scheme (although it works with the Java URL 
class).

We have particular problem with file:ddd/yyy because applying the 
resolution algorithm from RFC 3986, with backward compatible behaviour 
enabled for file: scheme, we have

file:ddd/yyy resolves against file:ddd/yyy as file:ddd/ddd/yyy
whereas
file:xxx resolves against file:xxx as file:xxx

This is significant when reading a file in, when it includes its own (or 
a related) URI (in this form). Since we located it using a file: URI we 
use that as the base when reading it. Our current behaviour just treats 
this URIs as absolute and leaves them unchanged, and everything works, 
but .... our behaviour cannot be justified from the RFCs.

Jeremy
Received on Saturday, 4 February 2006 12:00:33 UTC