A new proposal (was: Re: which layer for URI processing?) from David G. Durand on 2000-05-24 (xml-uri@w3.org from May 2000)

From: David G. Durand <david@dynamicdiagrams.com>
Date: Wed, 24 May 2000 17:30:31 -0400
To: <xml-uri@w3.org>
Message-Id: <a04310103b551ed7c602a@[216.207.71.175]>
I started this note responding to Tim, but in the process, I came up 
with a new compromise strategy that might address _all_ the problems. 
It's not possible to do that with any single technical solution, but 
I think it is possible if we formulate a longer-term plan with a 
clear goal, and a series of small steps to reaching that plan.

If that sounds intriguing, read on, and see if you agree with me.

At 2:02 PM -0400 5/24/00, Tim Berners-Lee wrote:
>XSLT uses XPath which is included, I understand in the "lower layer" in your
>scenario.
>[If not, then what is?]
>Supose I use XSLT to filter a document to ensure it doesn't have
>any of an http://example.com/detonator namespace in it, because processing
>this
>would allow the document to destroy the chemical plant.
>The XSLT sees "/detonator" in an incoming document
>http://example.com/doc.xml
>but it does not notice it as it does not absolutize it. The checked result
>is
>passed to the main control system. However, when
>this "upper layer" runs it absolutizes it to find out what in upper layer
>terms it really means, and
>instantiates a chemical plant handler to handle the http://example.com/foo.
>Bang.

This is an excellent example of why software should be able to avoid 
absolutizing URIs because that process depends on the knowledge of a 
base URI, and this is demonstrably fragile information, especially 
with regards to the unique identification of singular objects. The 
use cases around relative URIsfor namespaces all revolve around the 
rare applications that can deal with namespaces that are not globally 
unique.

The situation you describe can more readily be used to argue that 
allowing relative URIs _at all_ was the real problem. This is 
especially true, since the stated goal of namespaces was to supply 
globally-unique names to enable definition management in a very 
loosely coupled distributed system.


>Is this or is this not a problem?
>
>I wouldn't be sitting here plowing though all this mail if I didn't htink it
>was.
>
>You can do different consistent things betwen different layers but you
>cannot mess with identity of things common to both layers.

There is a claim that we can't forbid relative URIs, and the bug you 
describe, because documents exist that use them. We have no way of 
determining the extent of this problem.

There are also documents in existence that depend on the current 
definition of identity in the namespaces specification. We can't 
quantify the number of those documents either.

Requiring absolutization will fix this problem, but will break an 
unknown number of documents that were made in the belief that the 
namespaces spec. means what it says.

Eliminating URI reference syntax and requiring absolute URIs for 
namespaces also fixed the problem. It does not, under any reasonable 
interpretation conflict with the relevant RFCs. There is at least one 
extant example in the HTML BASE tag, as well as in the http protocol 
iteself. However, this solution also breaks documents, in this case, 
those that contain relative URI references.

Keeping things as they are but explicitly deprecating relative URIs 
and documenting the possible problems, the solution chosen earlier 
inside the W3C, avoids the evil of turning conformant data 
non-conformant, and recognizes the potential for erroneous behavior 
where relative namespace IDs are used.

There are no perfect solutions to the situation we are in.

The use of absolute URIs only is the best fit to the goals of the 
namespace specification, but breaks documents.

The requirement of full relative URI processing as a part of 
namespaces is not that close a fit to the design goals of namespaces, 
and breaks a different set of documents in a different way. It does 
leave open an attractive evolutionary path to a more dynamic 
namespace mechanism, one based on retrieval of content. This 
evolution depends critically on the completion of other 
infrastructure is in place, e.g. a standard format for namespace 
definitions, and metadata about them, a way to support multiple kinds 
of definition for a single namespace, etc.

I think the best way to get to the "fully evolved" version of 
namespaces is to do the following:

+ instantly deprecate relative URIs in namespaces. (but preserve the 
status quo in all its yuckiness). We thus preserve a version of 
namespaces that does not break existing documents.

+ instantly create a new revision, namespaces 1.1, that allows only 
absolute URIs. We instantly provide a way for people to avoid getting 
burned by the possible unintended consequences of using relative URIs 
carelessly.

+ publicly declare that namespaces 2.0 will re-introduce relative URI 
references, in a consistent way, coordinated with the creation of a 
standard resource body format for referencing multiple namespace 
definitions, that can be stored at a namespace URI. We now provide a 
clean, path to a new meaning for the relative URI syntax.

+ declare that the next revision of XML will include a normative 
reference to a specific version of the namespace specification. In 
this way, the XML version can be used to determine what sort of 
namespace processing is appropriate. This won't eliminate all 
confusion, but will make it easier for software to check for errors 
and report them. By linking XML and namespaces, we recognize the fact 
that most XML projects are using namespaces, and we introduce a way 
for an application to tell what sort of namespace processing is 
appropriate. We also provide an incentive for people to move to the 
most recent namespace specification over time.

I'm _way_ behind on this list, so I apologize if this option has been 
raised before, but I don't think it was explicitly raised in the 
internal discussion. This is another kind of compromise: it doesn't 
ease the confusion right away, but it sets a clear direction, and 
gives software authors a way to deal with a complex transition 
strategy.

It is conceivable that one could go right from deprecating relative 
URIs to absolutizing them, but to do that responsibly, the relevant 
namespace retrieval infrastructure and policies would have to be 
created almost immediately. It would clearly be better if time is 
taken to make those decisions carefully.

   -- David

>Tim BL
-- 
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
http://cs-people.bu.edu//dgd/             \  Chief Technical Officer
     Graduate Student no more!              \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
                                              \__________________________
Received on Wednesday, 24 May 2000 17:38:53 UTC