Re: A proposed solution from James Clark on 2000-06-10 (xml-uri@w3.org from June 2000)

From: James Clark <jjc@jclark.com>
Date: Sat, 10 Jun 2000 16:14:13 +0700
To: David Turner <dturner@microsoft.com>
CC: "'XML-uri@w3.org'" <XML-uri@w3.org>, Henrik Frystyk Nielsen <henrikn@microsoft.com>, Andrew Layman <andrewl@microsoft.com>
Message-ID: <394206E5.96D38DAD@jclark.com>
This proposal seems rather vague to me. I don't see anything in this
that says precisely when two namespace names are considered equal.

David Turner wrote:

> [[[According to RFC 2396 a URI reference can be either a relative or an
> absolute URI. The scheme of an absolute URI identifies the URI space to
> which that URI belongs. A URI space is typically defined with a set of
> properties concerning uniqueness, normalization rules etc. as well as
> one or more default mechanisms for resolving URIs belonging to that URI
> space.
> 
> Relative URIs are always defined within a context.

I thought they were defined relative to a base URI.  Is a context the
same thing as a base URI?

> Typical examples are
> relative references within the current document (fragment identifiers)
> and relative references between documents at the same or closely related
> level of hierarchy in the URI space.

If multiple levels of hierarchy count as the same context, then this
proposal does not solve the problem. Suppose I have a document
http://www.w3.org/a/b referencing an entity c/d which absolutizes to
http://www.w3.org/a/c/d.  If these have the same context, then a
namespace URI "foo" in the document will be treated as equal to a
namespace URI "foo" in the referenced entity despite the fact that it
refers to a difference resource after URI absolutization.

> Within the same context, relative
> links remain internally consistent and can act as unique identifiers
> (within that context) without actually being expanded relative to the
> context within which they are defined.
> 
> An application is responsible for knowing the context within which a
> relative link is defined. RFC 2396, section 5, provides several
> mechanisms for establishing the proper context within which relative
> URIs are defined. An application is also responsible for ensuring that
> relative identifiers are not treated as unique identifiers across
> contexts as ignorance of context can make distinct identifiers appear
> undifferentiated.]]]

So what happens when two relative URIs that come from different contexts
are compared?  This can happen even within a namespace processor when
testing attribute name identity in a document using external entities.

Let's try and make this precise.  Suppose we say that a namespace name
is a pair <C, R> where C is a context URI and A is a URI reference
exactly as specified in the namespace declaration attribute.  When an
entity has a base URI, then the base URI serves as the context URI,
otherwise a context URI is generated that uniquely identifies the
document (eg a uuid URI).

Now we say that a namespace name <C1, A1> is equal to a namespace name
<C2, A2> if and only if:

1. A1 is character for character identical to A2, and

2. either
   (a) A1 and A2 are absolute, or
   (b) both
      (i)  A1 and A2 are relative, and
      (ii) C1 and C2 are character for character identical

Is this what you have in mind?

There are several possible variations:

- you might want to strip everything after the last slash from the
context URIs

- you could add another alternative (c) that compares the absolutized
URIs when the context URI is also a base URI

- you could say it's an error if C1 and C2 are different (if so, what
should processors do?)

I think this is quite a promising approach.  As I see it, there are two
kinds of mismatch between namespace name identity and resource identity:

A. cases where namespace names are identical but the corresponding
resources are not

B. cases where namespace names are not identical but the corresponding
resources are

Now type B cases are relatively harmless and an unavoidable fact of
life, but type A cases are (to some of us anyway) unacceptable.  The
Microsoft proposal appears to be getting rid of type A mismatches by
accepting additional type B mismatches.

James
Received on Saturday, 10 June 2000 05:28:13 UTC