Re: Choose your namespace (Was : Personal view) from David G. Durand on 2000-06-19 (xml-uri@w3.org from June 2000)

From: David G. Durand <david@dynamicdiagrams.com>
Date: Mon, 19 Jun 2000 17:46:52 -0400
To: XML-uri@w3.org
Message-Id: <a04310105b5743f6351ea@[216.207.71.175]>
At 1:14 PM -0700 6/19/00, Henrik Frystyk Nielsen wrote:
>Instead of using the uniqueness of attributes as a binary decision between
>whether a document is correct or not, we should instead note that there
>may be times that inconsistencies can happen and that yes, these are
>faults, but that these may not be detected.

But why do this when the binary decision is what we actually _need_, 
to satisfy the goals of the namespace rec., and when literal 
comparison of absolute URIs enables this to be something that we can 
get?

The farther we move from a unique string approach the more kinds of 
failures are possible, and the less we meet the _very simple_ 
requirements for namespaces. A "forbid" solution has none of these 
errors, but kills old documents. A "literal" solution is confusing if 
you expect absolutization, but is also consistent with respect to 
identity.

Adding absolutization adds one point of variability since BASE 
information is _not_ unique, since every document has many meaningful 
base URIs in most situations. Base information (or "context") is 
inherently variable, and that's the power of relative URI references, 
and also the source of global naming consistency problems.

earlier in your example you write:

>Take for example this slightly different version of Daniel's example:
>
>-----------------
><x xmlns:n1="http://www.example.org/a"
>xmlns:n2="http://www.example.com/a">
>   <test n1:y="1" n2:y="2"/>
></x>
>-----------------
>
>This looks like a completely valid example, but let's say that I go to
>"http://www.example.org/a" and it gives back a redirect to
>"http://www.example.com/a". This is the exact same problem that Daniel
>pointed out but in this scenario, it doesn't depend on the location of the
>document. Does this mean that my document suddenly is invalid or is it
>even something that we should expect to ever be detected? Clearly it
>isn't.


This shows that the further we move along the "standard URI 
processing" path, the more sources of error there are, because 
standard URI processing is optimized for a different problem than 
simple unique identifier assignment.

This is why we need to deprecate relative URI references for 
namespaces, and preferably outlaw them in the next version, because 
they start us down a path the leads us away from the primary goals of 
the namespaces standard.

If you have other primary goals, that's fine, but then the W3C 
membership needs to go through a whole process of convincing people 
that they are worth working on, figuring out if namespaces are 
compatible with them, and finally standardizing a way to achieve 
them. Changing the stated purpose and meaning of namespaces is not a 
"bug fix" but an unwarranted change to an existing standard. We are 
in a situation where we are bound by a responsibility to make the 
smallest change to the status quo that makes the standard 
well-defined.

>James Clark [1] has pointed out problems of clarity of the algorithm in
>comparing URIs and I think we need to think carefully about this and fix
>the URI spec where not clear. However, we should *not* try to design
>namespaces thinking if that we avoid URIs we avoid the problems of a
>decentralized system.

We need a clearly defined equivalence relation that can definitively 
define two names as equivalent when they are equivalent in _all_ URI 
schemes. Octet by octet comparison is the only method that does this. 
It's not a problem if two URIs that are inequivalent for namespace 
processing are actually equivalent in some other context -- because 
that problem can't be solved in any case; a new name can always be 
created.

Any relaxing of this kind of comparison means that equivalence and 
inequivalence of namespaces may be indeterminable in principle to 
some fully conforming applications.


>Let me clarify what is meant by "context": The common URI syntax has
>specific mechanisms for encoding some commonly used properties like naming
>authority and relative identifiers but others it doesn't: For example,
>there is no common way to encode persistence properties of a identifier or
>when it was created: "this identifier used Microsoft as of June 2000 as
>naming authority".

Well, urn: schemes do declare persistence properties, but this is a nit.

>For the specific case of relative URIs, the context is given by the rules
>defined in RFC 2396 section 5.1. What I think this section fails to point
>out is that it may not be necessary to determine a base URI in order to
>use relative URIs as identifiers if they are dealt with within the same
>context. This was the reason for the specific wording in the proposal.

This variability is a bug from a namespace point of view (as clearly 
articulated by the goals of the specification).

>For other properties, the context is defined by the URI space itself and
>may not be explicit in the URI. Therefore, in order to know and use these
>properties of a name, it is necessary to know the context (ie properties)
>imposed by that URI space.

expanding the scope of the bug so that it's universal is a pretty 
poor way to fix it...


>In addition to this clarification, I have noted two other clarifications
>for the proposed wording which are:
>
>* We should encourage people generating documents to be consistent about
>the use of URIs so that simple mistakes are avoided [3]
>
>* We should ensure that the algorithm for comparing URIs which currently
>is in the HTTP spec is moved to the URI spec [1]
>
>We should work on this but not loose track of the problem space we are
>designing for.

Right. The namespace spec. is trying to provide globally unique 
names. Period. Not retrieval. Not context-dependent schema retrieval.

If we focus on the problem space, your proposal is singularly 
unattractive, because it opens even more cans of worms, and closes 
none of the ones that started this debate.

We must deprecate or forbid relative namespace identifiers until a 
clear meaning for retrieval is defined. We must preserve in 
perpetuity the use of octet-by-octet equivalence strings as the 
standard for XML processors to determine identity. We could introduce 
relative URI references and all the rest later on, as long as the 
equivalence test remains well-defined.

   -- David
-- 
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
http://cs-people.bu.edu//dgd/             \  Chief Technical Officer
     Graduate Student no more!              \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
                                              \__________________________
Received on Monday, 19 June 2000 17:50:05 UTC