Problems I cannot get past with using relative URIs for identity. from Ray Whitmer on 2000-05-17 (xml-uri@w3.org from May 2000)

From: Ray Whitmer <ray@xmission.com>
Date: Wed, 17 May 2000 16:08:00 -0600 (MDT)
To: xml-uri@w3.org
Message-ID: <Pine.GSO.4.10.10005171542450.25425-100000@xmission.xmission.com>
Some of the problems I can't get past with absolutizing namespace names, which
lead me to believe that it would undermine namespaces significantly:

1.  The specification for URIs was designed for content retrieval, especially 
on legacy file systems, not for type identity.  Hence, most of the real 
equivalence of URIs that people rely on is computed on the server, not on the 
client.  Since the primary function of URIs is to retrieve content, the 
retrieval functions usually never bother to ask about the identity of content 
accessed by a URI.  The server just returns the proper content.  Hence, the 
spec only had to do simple enough concatenation for absolutization to allow 
the server to do the context-specific work of real resolution of the URI to a 
specific resource.

Namespace names are not primarily about retrieving resources, but rather about 
identifying a namespace.  In this case, the simple absolutization rules cannot 
possibly establish identity of namespaces, which users consider identical, even 
if the spec did not need to fully absolutize them.

For example, from a user's point of view, and probably most developers as well, 
http://first.com/second/third/fourth.html with a relative reference to 
../fifth/sixth/seventh.html is the same as a reference to 
http://first.com/second/fifth/sixth.com.  In this case, to use the argument of 
the day, the relative reference is a legal URI, but the RFC would resolve it, 
I suspect, as http://first.com/second/third/../fifth/sixth/seventh.html, which 
is not at all the same identity.  I think it would be improper for a 
theoretical modified namespace spec which dictated absolutization of relative 
URI references to mandate that these be treated as distinct names, when the 
server is allowed and expected by all to return the same resource.  If these 
are, indeed references, then the server should be permitted to have the last 
say on identity.

Otherwise, we are not talking about absolutization, but only partial 
absolutization, which is a different interpretation of the URI from what 
happens when the URI is used to retrieve content, and is IMO bound to seem 
confusing and arbitrary to users.

2.  The flexibility of URIs does not seem to extend to absolutization.  
RFC-specified absolutization favors legacy unix file path syntax, throwing 
away CGI parameters, for example.  There should be different absolutization for 
alternative protocols.  With full absolutization required for any real identity 
matching, it mandates even more that the server be involved in any 
absolutization for identity purposes so it can handle different types of URIs 
it knows about.

3.  What is the meaning of an absolutized relative namespace name in infoset, 
or more concretely in DOM?  Is it the relative name, or is it the absolute 
name?  It cannot be both simultaneously.  What is preserved when nodes are 
moved within the hierarchy -- the absolutized concept of type that is so 
important for specialized DOM implementations, or the relative info?   You 
could choose:

a.  The DOM parser absolutizes and DOM effectively does not permit relative 
URIs, except as convenience.
  
b.  Relative?  Do we really intend to have the validation rules and 
implementing classes change as a node moves in the hierarchy to another or as 
they are saved in a different location?  It is very common to tie the 
implementing class and validation rules to the type.  Type will now be in 
flux, whereas DOM has traditionally considered the type constant for these 
reasons?  Are you willing to frequently re-absolutize the name so it can be 
compared?

c.  Absolute(ized): with relative preserved?  How is the relative path 
preserved in the nodes?  The relative specifier would be at best like the 
prefix which is considered syntactic sugar, which may be replaced by the 
serializer.  In this case, a serialized document would likely lose the relative
nature of namespaces, so the relative properties of the namespace 
specifications could not be relied on for essential functionality (just like 
a document becomes invalid according to DTDs when saved in canonical form).  
All the problems of prefixes arize again for relative URIs, for example an 
entity needs to know the base URI of the position in which it is referenced, 
or an absolutized URI can not be cmputed, which may be different in different 
references, so entity hierarchies would have no absolutized URIs.  Do we really 
want this additional complexity on namespaces?

If so, perhaps another group should make a different naming standard for those 
who just want namespaces, which could look like Java packages so no one gets so 
confused to marry an arbitrary resource to the type declaration.

5.  Once you have convinced everyone that namespaces should access resources, 
how do you get consensus about what the function of the resource should be?  
And what do you do when multiple purposes conflict?  That is the point of 
separation between syntax and semantics in the first place.  Data needs to be 
reusable.  If I were doing this, I would get the same functionality from a 
separate PI, and not destroy the abstractness / generality of the data.  Once 
you go mangling types to point to resources, it is hard to undo the damage, I 
think.

6.  It seems unnatural to tie typing information to default path of a part of a 
hierarchy.  If I needed this type of relative typing, I would want to relate 
types to other types, not necessarily to where the user stuck a collection of 
graphics files.  Relative of types can apparently already be accomplished by 
using entities, which permits the architect to structure the relationships 
rather than forcing them to flow down the hierarchy with the default locations 
of, for example, graphics files.

7.  I can count several times already that someone in a development or 
standards group has suggested using a URI as an identifier since this issue 
arose.  In the cases I have seen, people are now running screaming from using 
URIs for anything besides a real retrieval pointer, having seen the baggage 
that brings with it if someone might want to relativize them in some arbitrary 
fashion as is being done with namespaces.  I do not think this (IMO) bad 
practice of using relative URIs promotes the cause of URIs at all, and I have 
recent repeated experiences to convince me of that.

Ray Whitmer
ray@xmission.com
Received on Wednesday, 17 May 2000 18:08:06 UTC