Choose your namespace (Was : Personal view) from Henrik Frystyk Nielsen on 2000-06-19 (xml-uri@w3.org from June 2000)

From: Henrik Frystyk Nielsen <frystyk@microsoft.com>
Date: Mon, 19 Jun 2000 13:14:32 -0700
To: <XML-uri@w3.org>
Cc: "Andrew Layman" <andrewl@microsoft.com>, "David Turner" <dturner@microsoft.com>
Message-ID: <005501bfda2b$081f6f70$83b11eac@redmond.corp.microsoft.com>
This is a bit lengthy mail with two parts:

* Properties of URIs vs URI spaces

* Relationship to the proposal [5] that we sent out

The reason is that I think we have to internalize the choices that we have
and what they mean before we can get to any agreement so here goes:

URIs and URI spaces
-------------------
Many if not most of the discussions on this list talk about desired
properties of a URI space used for identifying XML namespaces. In order
for us to make progress in this discussion, it is essential to keep
separate properties of the URI syntax from any of the properties of the
URI space that it is used to encode. URIs don't define whether a URI space
has properties like supporting indirection, being case sensitive or not,
supporting relative URIs etc. RFC 2396 defines the *syntax* for encoding
common properties but that doesn't mean that URIs *define* these
properties

As for what you want from a name, there are really only five choices:

1) You can use centrally assigned identifiers that don't support
indirection (TCP port numbers and centrally agreed upon MIME header
fields)

2) You can use decentrally assigned, non-unique identifiers that don't
support indirection (general MIME header fields)

3) You can use decentrally assigned, unique identifiers that don't support
indirection (GUIDs)

4) You can use decentrally assigned, unique identifiers that do support
indirection (DNS hostnames) but only provides one result of the
indirection (an IP address)

5) You can use decentrally assigned, unique identifiers that do support
indirection and an open-ended set of results from that indirection (a
document). Examples of URI spaces that support this are "http:" and
"ftp:".

URIs allow us to encode names from *all* these categories - but again, it
doesn't mean that it *forces* which you have to use:

* If you want a centralized name, pick 1)

* If you want a decentralized name, pick 2)-5)

* If you don't want indirection, pick 2) or 3)

* If you don't want case sensitivity, pick 3)

* If you do want indirection on the Internet, pick 5)

* If you don't want relative URIs, pick one that doesn't support it (for
example GUIDs)

However, once you have picked, you have to live with the properties of
that namespace but that has nothing to do with whether you encode it as
URIs or not.

The problem Daniel brings up is *not* a basic property of relative URIs
but can happen in any decentralized system that supports indirection. It
is inherently impossible to guarantee that the rule in section 5.3 about
uniqueness of attributes is detected in all cases.

Take for example this slightly different version of Daniel's example:

-----------------
<x xmlns:n1="http://www.example.org/a"
xmlns:n2="http://www.example.com/a">
  <test n1:y="1" n2:y="2"/>
</x>
-----------------

This looks like a completely valid example, but let's say that I go to
"http://www.example.org/a" and it gives back a redirect to
"http://www.example.com/a". This is the exact same problem that Daniel
pointed out but in this scenario, it doesn't depend on the location of the
document. Does this mean that my document suddenly is invalid or is it
even something that we should expect to ever be detected? Clearly it
isn't.

Instead of using the uniqueness of attributes as a binary decision between
whether a document is correct or not, we should instead note that there
may be times that inconsistencies can happen and that yes, these are
faults, but that these may not be detected.

The discussion of whether to limit the properties of the URI spaces that
URIs can be used to encode (forbid relative URIs, forbid indirection etc.)
is really a discussion of what properties you want for identifying your
XML namespace. As a basic consumer of these names, the only thing that is
needed on top of octet-by-octet comparison is to know about relative URIs
so the difference on the consumer side is very little. This leaves the
producer with a simple choice: pick your namespace with the properties you
want.

James Clark [1] has pointed out problems of clarity of the algorithm in
comparing URIs and I think we need to think carefully about this and fix
the URI spec where not clear. However, we should *not* try to design
namespaces thinking if that we avoid URIs we avoid the problems of a
decentralized system.

Relations to Proposal
---------------------
The proposal that we sent out [5] makes the choice very clear - the
namespace identifier is a URI - end of story. It furthermore clarifies
that in order to use relative URIs, you need to take into account the
context you are working within.

Let me clarify what is meant by "context": The common URI syntax has
specific mechanisms for encoding some commonly used properties like naming
authority and relative identifiers but others it doesn't: For example,
there is no common way to encode persistence properties of a identifier or
when it was created: "this identifier used Microsoft as of June 2000 as
naming authority".

For the specific case of relative URIs, the context is given by the rules
defined in RFC 2396 section 5.1. What I think this section fails to point
out is that it may not be necessary to determine a base URI in order to
use relative URIs as identifiers if they are dealt with within the same
context. This was the reason for the specific wording in the proposal.

For other properties, the context is defined by the URI space itself and
may not be explicit in the URI. Therefore, in order to know and use these
properties of a name, it is necessary to know the context (ie properties)
imposed by that URI space.

In addition to this clarification, I have noted two other clarifications
for the proposed wording which are:

* We should encourage people generating documents to be consistent about
the use of URIs so that simple mistakes are avoided [3]

* We should ensure that the algorithm for comparing URIs which currently
is in the HTTP spec is moved to the URI spec [1]

We should work on this but not loose track of the problem space we are
designing for.

Henrik Frystyk Nielsen,
mailto:frystyk@microsoft.com

[1] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0619.html
[2] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0667.html
[3] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0678.html
[4] http://lists.w3.org/Archives/Public/xml-uri/2000May/0282.html
[5] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0406.html
Received on Monday, 19 June 2000 16:16:13 UTC