Re: Problems I cannot get past with using relative URIs for identity. from Tim Berners-Lee on 2000-05-18 (xml-uri@w3.org from May 2000)

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 18 May 2000 10:29:37 -0400
To: "Ray Whitmer" <ray@xmission.com>, <xml-uri@w3.org>
Message-ID: <001501bfc119$dcd06010$75ec5c8b@ridge.w3.org>
-----Original Message-----
From: Ray Whitmer <ray@xmission.com>
To: xml-uri@w3.org <xml-uri@w3.org>
Date: Wednesday, May 17, 2000 6:08 PM
Subject: Problems I cannot get past with using relative URIs for identity.


>Some of the problems I can't get past with absolutizing namespace names,
which
>lead me to believe that it would undermine namespaces significantly:
>
>1.  The specification for URIs was designed for content retrieval,
especially
>on legacy file systems, not for type identity.

Excuse me,  URIs were originally designed to identify -- the term
"Identifier" is used
for a reason. The actual properties of persitence and idenmtity are, for the
HTTP space,
the choice of the publisher, which has provided a lot of flexibility.

Remember there are also URI schemes such as uuid: and md5: which have very
different properties!

HTTP URIs should be thought of as names for which there is a widely deployed
distributed catalog system.  I rant continually at sites which change their
HTTP URIs
and point out that W3C has a space of HTTP URIs which are not changed.

> Hence, most of the real
>equivalence of URIs that people rely on is computed on the server, not on
the
>client.

For "computed on the server" read: defined by the publisher. This applies to
HTTP and FTP only.

> Since the primary function of URIs is to retrieve content,

This is your rhetoric.  Retreival is useful but identity is important too.
However,
many schemes such as uuid: and mid: identify primarily and any URI-based
retrieval
is added on afterward.

>the
>retrieval functions usually never bother to ask about the identity of
content
>accessed by a URI.

There is in fact alot of HTTP aparatus to relate the different URIs which
are conected
and to talk about persistence, for cache control for example.

>The server just returns the proper content.  Hence, the
>spec only had to do simple enough concatenation for absolutization to allow
>the server to do the context-specific work of real resolution of the URI to
a
>specific resource.


To resolve the abstarct resource to a suitable HTTP entity as the HTTP
terminolgy goes.

>Namespace names are not primarily about retrieving resources, but rather
about
>identifying a namespace.  In this case, the simple absolutization rules
cannot
>possibly establish identity of namespaces, which users consider identical,
even
>if the spec did not need to fully absolutize them.

>
>For example, from a user's point of view, and probably most developers as
well,
>http://first.com/second/third/fourth.html with a relative reference to
>../fifth/sixth/seventh.html is the same as a reference to
>http://first.com/second/fifth/sixth.com.

You mean http://first.com/second/fifth/sixth.html

> In this case, to use the argument of
>the day, the relative reference is a legal URI, but the RFC would resolve
it,
>I suspect, as http://first.com/second/third/../fifth/sixth/seventh.html,
which
>is not at all the same identity.

You suspect wrong.

I don't know why you should think that users understood practice and the RFC
should be different.
The RFC defines how things actually work which is what users are all used to
(and happens to basically match algoirthms used in unix for decades).

>I think it would be improper for a
>theoretical modified namespace spec which dictated absolutization of
relative
>URI references to mandate that these be treated as distinct names, when the
>server is allowed and expected by all to return the same resource.  If
these
>are, indeed references, then the server should be permitted to have the
last
>say on identity.
>
>Otherwise, we are not talking about absolutization, but only partial
>absolutization, which is a different interpretation of the URI from what
>happens when the URI is used to retrieve content, and is IMO bound to seem
>confusing and arbitrary to users.



I assume this idea of "partial absolutization" is based on the misconception
above. There is no such thing defined. There is only absolutization.

>2.  The flexibility of URIs does not seem to extend to absolutization.
>RFC-specified absolutization favors legacy unix file path syntax, throwing
>away CGI parameters, for example.  There should be different absolutization
for
>alternative protocols.

Absolutely not!  It is essential that the asolutization canbe done as a
simple string function without any knowledge of any specific schemes.

> With full absolutization required for any real identity
>matching, it mandates even more that the server be involved in any
>absolutization for identity purposes so it can handle different types of
URIs
>it knows about.


You are using the term absoluization in a manner different from the way it
has been used on this list.  I have seen no one argue for involving the
server in the processes. Many URI schemes don't have a concept of a server.

>3.  What is the meaning of an absolutized relative namespace name in
infoset,
>or more concretely in DOM?  Is it the relative name, or is it the absolute
>name?  It cannot be both simultaneously.

On the contrary, very often systems aloow separate access to the actual
attribute value
and to the interpreted object.  xHTML editors typically preserve the choice
of relative or absolute URI used in a A HREF, but adjust the value when a
copy of the document is saved to a new URI.  I would suggest that this
behaviour be the norm.

> What is preserved when nodes are
>moved within the hierarchy -- the absolutized concept of type that is so
>important for specialized DOM implementations, or the relative info?   You
>could choose:


You are talking about an XML node moving within the DOM tree?
Or a document being moved through URI space?
In all cases the absolute URI should be presrved.  To preserve the choice of
relative URI or absolute URI is important in practice too.
I am hoping that the same code is going to be used for all URIs of course.

>a.  The DOM parser absolutizes and DOM effectively does not permit relative
>URIs, except as convenience.
>
>b.  Relative?  Do we really intend to have the validation rules and
>implementing classes change as a node moves in the hierarchy to another or
as
>they are saved in a different location?  It is very common to tie the
>implementing class and validation rules to the type.  Type will now be in
>flux, whereas DOM has traditionally considered the type constant for these
>reasons?  Are you willing to frequently re-absolutize the name so it can be
>compared?
>
>c.  Absolute(ized): with relative preserved?  How is the relative path
>preserved in the nodes?  The relative specifier would be at best like the
>prefix which is considered syntactic sugar, which may be replaced by the
>serializer.  In this case, a serialized document would likely lose the
relative
>nature of namespaces, so the relative properties of the namespace
>specifications could not be relied on for essential functionality (just
like
>a document becomes invalid according to DTDs when saved in canonical form).
>All the problems of prefixes arize again for relative URIs, for example an
>entity needs to know the base URI of the position in which it is
referenced,
>or an absolutized URI can not be cmputed, which may be different in
different
>references, so entity hierarchies would have no absolutized URIs.  Do we
really
>want this additional complexity on namespaces?
>
>If so, perhaps another group should make a different naming standard for
those
>who just want namespaces, which could look like Java packages so no one
gets so
>confused to marry an arbitrary resource to the type declaration.


Please, if a server returns something when queried for the namespace
resource,
then that is now an arbitrary resource. The server is controlled by the
owner of the
name and the document returned is therefore definitive. It is not arbirary.

To use java class names I say what I said about FPIs.  If you really think
they solve problems
other URI schemes don't solve, then you can propose that they be a new URI
scheme.
Then all new designes will be able to take advantage of them.

>5.  Once you have convinced everyone that namespaces should access
resources,
>how do you get consensus about what the function of the resource should be?


The function of the resource is to express information about the namespace.
This may involve amixture of languages some of which we have now and some we
will
invent as the technology evolves.

>And what do you do when multiple purposes conflict?

What multiple purposes, and can you give an example of conflict?

>That is the point of
>separation between syntax and semantics in the first place.

Oh, you mean that one schema document should not contain syntax-related and
semantic information?  That doesn't seem to be a problem to me.
Specification douments do.  Of course
one could separate them and refe to one from the other.

>  Data needs to be
>reusable.

I don't understad that remark in this context now the following pararaph.

> If I were doing this, I would get the same functionality from a
>separate PI, and not destroy the abstractness / generality of the data.
Once
>you go mangling types to point to resources, it is hard to undo the damage,
I
>think.
>
>6.  It seems unnatural to tie typing information to default path of a part
of a
>hierarchy.  If I needed this type of relative typing, I would want to
relate
>types to other types, not necessarily to where the user stuck a collection
of
>graphics files.  Relative of types can apparently already be accomplished
by
>using entities, which permits the architect to structure the relationships
>rather than forcing them to flow down the hierarchy with the default
locations
>of, for example, graphics files.
>
>7.  I can count several times already that someone in a development or
>standards group has suggested using a URI as an identifier since this issue
>arose.  In the cases I have seen, people are now running screaming from
using
>URIs for anything besides a real retrieval pointer, having seen the baggage
>that brings with it if someone might want to relativize them in some
arbitrary
>fashion as is being done with namespaces.  I do not think this (IMO) bad
>practice of using relative URIs promotes the cause of URIs at all, and I
have
>recent repeated experiences to convince me of that.


Could you plese indicate the standards groups which have run screaming from
URIs?

I would point out that the web is humming with relative URIs which lend
managability to huge amounts of the data.  Our own website would be
very much more difficult to manage without them.

>Ray Whitmer
>ray@xmission.com
>
Received on Thursday, 18 May 2000 18:37:03 UTC