Re: Problems I cannot get past with using relative URIs for identity. from Ray Whitmer on 2000-05-19 (xml-uri@w3.org from May 2000)

From: Ray Whitmer <ray@xmission.com>
Date: Fri, 19 May 2000 00:34:38 -0600 (MDT)
To: Tim Berners-Lee <timbl@w3.org>
cc: xml-uri@w3.org
Message-ID: <Pine.GSO.4.10.10005182310530.26524-100000@xmission.xmission.com>
On Thu, 18 May 2000, Tim Berners-Lee wrote:
> This is your rhetoric.  Retreival is useful but identity is important too.
> However,
> many schemes such as uuid: and mid: identify primarily and any URI-based
> retrieval
> is added on afterward.
 
URI's simply do not have the ability to identify resources.  They locate them.

> You suspect wrong.

Yes, as I admitted a day or so ago, I got that example wrong.  I apologize
again for not reading the spec before-hand.  The principle is still true.  
Perhaps you would like to play a game of trying to guess whether several 
URIs really identify the same or different resources.  URIs are interpreted 
by servers which have a lot of freedom to establish many-to-many relationships 
between URLs and the resources they retrieve.

> I don't know why you should think that users understood practice and the RFC
> should be different.
> The RFC defines how things actually work which is what users are all used to
> (and happens to basically match algoirthms used in unix for decades).

In some cases yes, in some cases no.  It encourages case insensetivity, for
example -- it wasn't clear to me that it mandated it.  Unix does not allow for
it.  This means that the client does not know whether the case may distinguish 
different resources or not.  It does not dictate a character set for escaped 
characters, so that upper/lower case equivalence could be established if it
clearly dictated that.

saying http: versus https: on the same server may or may not identify
equivalent resources, but generally only changes the procedure for accessing
the resource.  It does not identify equivalence of symbolic links and other 
Unix things that may make two URIs equivalent.  It does not deal with caches 
which may mean that the location the file is retrieved from may not be the 
true identity, and so on.  It represents a retrieval algorithm, not a unique 
identifier.

> I assume this idea of "partial absolutization" is based on the misconception
> above. There is no such thing defined. There is only absolutization.
 
On the contrary, it came up many times during the discussion how much 
the client would be required to do.

> Absolutely not!  It is essential that the asolutization canbe done as a
> simple string function without any knowledge of any specific schemes.

Requiring that does not give enough knowledge of the specifics of a
protocol to make it possible.

> You are using the term absoluization in a manner different from the way it
> has been used on this list.  I have seen no one argue for involving the
> server in the processes. Many URI schemes don't have a concept of a server.

The point was, without the server to do some canonicalization, the 
absolutization described in the RFC is not sufficient to say whether two
URIs identify the same resources or different resources.  This point was
brought up in several discussions I had, and always represented as a trivial
form of absolutization that should occur, not performing most mappings of
a URL server to identify the actual resource to return.
 
> On the contrary, very often systems aloow separate access to the actual
> attribute value
> and to the interpreted object.  xHTML editors typically preserve the choice
> of relative or absolute URI used in a A HREF, but adjust the value when a
> copy of the document is saved to a new URI.  I would suggest that this
> behaviour be the norm.

I should have said, the primary / winning meaning.  As I went on to describe,
we already have one such duality in namespaces, which causes a fair amount of
complexity and disagreement -- the prefix versus the URI.

> > What is preserved when nodes are
> >moved within the hierarchy -- the absolutized concept of type that is so
> >important for specialized DOM implementations, or the relative info?   You
> >could choose:
> 
> You are talking about an XML node moving within the DOM tree?
> Or a document being moved through URI space?
> In all cases the absolute URI should be presrved.  To preserve the choice of
> relative URI or absolute URI is important in practice too.
> I am hoping that the same code is going to be used for all URIs of course.

But when you serialize, only one can be preserved, unless they happen to
exactly coincide.  What is the point of having used relative URIs, if
programmatically moving it to a new location does not cause the resolution
to be different?  I thought that was the point, no?

> >a.  The DOM parser absolutizes and DOM effectively does not permit relative
> >URIs, except as convenience.
> >
> >b.  Relative?  Do we really intend to have the validation rules and
> >implementing classes change as a node moves in the hierarchy to another or
> as
> >they are saved in a different location?  It is very common to tie the
> >implementing class and validation rules to the type.  Type will now be in
> >flux, whereas DOM has traditionally considered the type constant for these
> >reasons?  Are you willing to frequently re-absolutize the name so it can be
> >compared?
> >
> >c.  Absolute(ized): with relative preserved?  How is the relative path
> >preserved in the nodes?  The relative specifier would be at best like the
> >prefix which is considered syntactic sugar, which may be replaced by the
> >serializer.  In this case, a serialized document would likely lose the
> relative
> >nature of namespaces, so the relative properties of the namespace
> >specifications could not be relied on for essential functionality (just
> like
> >a document becomes invalid according to DTDs when saved in canonical form).
> >All the problems of prefixes arize again for relative URIs, for example an
> >entity needs to know the base URI of the position in which it is
> referenced,
> >or an absolutized URI can not be cmputed, which may be different in
> different
> >references, so entity hierarchies would have no absolutized URIs.  Do we
> really
> >want this additional complexity on namespaces?
> >
> >If so, perhaps another group should make a different naming standard for
> those
> >who just want namespaces, which could look like Java packages so no one
> gets so
> >confused to marry an arbitrary resource to the type declaration.
> 
> 
> Please, if a server returns something when queried for the namespace
> resource,
> then that is now an arbitrary resource. The server is controlled by the
> owner of the
> name and the document returned is therefore definitive. It is not arbirary.

The point of XML to me (and SGML before that) is many processing models for 
the same information.  Schemas, DTDs, and other syntactic controls tell 
nothing about the meaning, and may compete as other standards do.  It is
unreasonable, IMO, to expect to find one particular one at the end of a
namespace declaration, or to expect most applications to pay attention to
it.  Only extremely-general purpose applications might look at it.  Most
only care about transforming in and out of objects with more specific /
proprietary / local meaning.  If someone changes the syntax, they can
decide what to do with the unrecognized part: ignore it, pass it along,
or raise an error.  These non-general specific-purpose applications are
the ones that interpret the syntax with meaning.  Hence, the transform is
far more important to the processing recipient than the schema, and the
proper transform to lend local meaning will be different for different
recipients.

> To use java class names I say what I said about FPIs.  If you really think
> they solve problems
> other URI schemes don't solve, then you can propose that they be a new URI
> scheme.
> Then all new designes will be able to take advantage of them.

But then you might insist on absolutizing them, or otherwise insisting that
they represent a retrieval pointer rather than an identity.

> >And what do you do when multiple purposes conflict?
> 
> What multiple purposes, and can you give an example of conflict?
 
I just did.  The schema is largely irrelevant.  It is the transformation
that is important, which the schema gives little insight about.  Different
processing models require different transformations.  There are some who
would like to dictate a single processing model for all data.  They have
been doing it for years.  AKA the war of proprietary formats.  But they
have failed to make their solutions reusable because no two sets of
requirements are equal.

> Oh, you mean that one schema document should not contain syntax-related and
> semantic information?  That doesn't seem to be a problem to me.
> Specification douments do.  Of course
> one could separate them and refe to one from the other.

I am not convinced that one document can even represent all desired syntactical
restrictions.  People keep inventing new microparsers for things that XML is
considered too verbose for, like xpath, svg, many url protocols, even things
like currency and dates are microsyntaxes, and could be easly reduced to
simpler things like integer attributes if people were not too inventive of 
shorthand syntaxes.

Are you anticipating creating a meta-schema that transforms all microsyntaxes
into XML, and then somehow makes all forms equivalent?  Equivalent to what?
And even then, you are a long ways from giving things meaning.

Since no one here has apparently had his fill of analogies;-) let me recite one
I heard with respect to Java Beans:  People are trying to make a set of
definitions that automatically snap together -- like a jigsaw puzzle, which
is fine if you are satisfied with the picture on the front of the box.  To
go much beyond that -- a dictatorship of ideas -- the approach is quite
different.  The pieces are more individualistic, simple, and make no
pretenses to perfectly fit the surrounding pieces.  You use "glue" --
or manually-created mappings to hold it together and mold it into the local 
meaning and relevance.  The pieces are made to work together in many new
settings and situations that were not anticipated, because the pieces are
expected to be manually mapped into relationships.

> Could you plese indicate the standards groups which have run screaming from
> URIs?

Look around you.  I will not point to them publicly right now.

> I would point out that the web is humming with relative URIs which lend
> managability to huge amounts of the data.  Our own website would be
> very much more difficult to manage without them.

W3c would have to make do like everyone else if the website supported dynamic 
data relying on multidimensional parameterized URI syntax, which is where 
much of the web is going, rather than file-system-like URIs.  Many go to
great lengths writing relative navigation schemes that have nothing to do with
the URI RFC, because it deals only with trivial name hierarchies and discards 
parameters.

For example, you might have different parameters describing:

The working group
The stage of the work
The publicness of the work
The type of resource
The user's desired medium (html, xml, cell phone markup, etc.)
The user's language
The user's locale

It is generally desirable to be able to switch any of these relatively
while holding the others constant.  But the URI scheme which would force
relativism to occur within a single hierarchy would force you to give
each item priority rather than allowing them to be equal dimensions, and
discard everything of lesser priority.  This makes the RFC-based
relativism unworkable for common uses.  It was designed for legacy file
systems, which sometimes can capture part of the identity of a resource.

Ray Whitmer
ray@xmission.com
Received on Friday, 19 May 2000 02:34:44 UTC