Re: A proposed solution from Henrik Frystyk Nielsen on 2000-06-11 (xml-uri@w3.org from June 2000)

From: Henrik Frystyk Nielsen <frystyk@microsoft.com>
Date: Sat, 10 Jun 2000 19:23:49 -0700
To: "James Clark" <jjc@JCLARK.COM>, "David Turner" <dturner@microsoft.com>
Cc: <XML-uri@w3.org>, "Andrew Layman" <andrewl@microsoft.com>
Message-ID: <009601bfd34c$1727c200$83b11eac@redmond.corp.microsoft.com>
> This proposal seems rather vague to me. I don't see anything in this
> that says precisely when two namespace names are considered equal.

In fact it does - it says that a namespace identifier is a URI which
means that at the basic level you compare on a octet by octet manner
taking into account the context in which any relative URIs are defined.

The URI spec also defines a set of common syntax equivalence rules for
the hostname and the default port number etc. but I wouldn't bet that
applications get those consistently right.

Furthermore it says that a URI scheme may define further normalization
rules that can have an impact on how URIs are defined. However, as you
can never expect that a URI parser knows about the specific scheme you
use, there is no guarantee that those normalization rules are followed.

So at the basic level, it is octet-by-octet comparison. If you don't
think these rules are clear enough then we should amend the URI spec -
not the namespace spec.

>> Relative URIs are always defined within a context.
>
> I thought they were defined relative to a base URI.  Is a context the
> same thing as a base URI?

The reason for using the term "context" instead of "base URI" is to make
it clear that relative URIs in fact can be used within a constrained
context without actually knowing or using the base URI.

As an analogy, I can evaluate the location of stuff in the room where I
am sitting relative to the floor, the walls, and the ceiling of the room
without knowing anything about what floor of the building the room is on
or what city the building is in.

> If multiple levels of hierarchy count as the same context, then this
> proposal does not solve the problem. Suppose I have a document
> http://www.w3.org/a/b referencing an entity c/d which absolutizes to
> http://www.w3.org/a/c/d.  If these have the same context, then a
> namespace URI "foo" in the document will be treated as equal to a
> namespace URI "foo" in the referenced entity despite the fact that it
> refers to a difference resource after URI absolutization.

The examples refer to examples of relative URIs - not contexts.

>> An application is responsible for knowing the context within which a
>> relative link is defined. RFC 2396, section 5, provides several
>> mechanisms for establishing the proper context within which relative
>> URIs are defined. An application is also responsible for ensuring
that
>> relative identifiers are not treated as unique identifiers across
>> contexts as ignorance of context can make distinct identifiers appear
>> undifferentiated.]]]
>
> So what happens when two relative URIs that come from different
contexts
> are compared?  This can happen even within a namespace processor when
> testing attribute name identity in a document using external entities.

Retrieving external entities may indeed affect the context in which
relative URIs are evaluated - I tried to address this in my earlier
response [2].

> Let's try and make this precise.  Suppose we say that a namespace name
> is a pair <C, R> where C is a context URI and A is a URI reference
> exactly as specified in the namespace declaration attribute.  When an
> entity has a base URI, then the base URI serves as the context URI,
> otherwise a context URI is generated that uniquely identifies the
> document (eg a uuid URI).

For the sake of namespace identification within a context you don't need
to define a base URI.

> Now we say that a namespace name <C1, A1> is equal to a namespace name
> <C2, A2> if and only if:
>
> 1. A1 is character for character identical to A2, and
>
> 2. either
>    (a) A1 and A2 are absolute, or
>    (b) both
>       (i)  A1 and A2 are relative, and
>       (ii) C1 and C2 are character for character identical
>
> Is this what you have in mind?

If we only look at A1 and A2 the description that I have above comes
close if you take into account the context within which you compare
relative URIs. I think the difference is that whereas you say that there
always is a base URI, our proposal is that it doesn't matter what the
base URI is as long as you use the relative URI within the same context
only. The advantage of this is that you don't force parsers to dream up
base URIs if they don't have to. Unless of course you suggest that the
base URI can be completely virtual (and arbitrary) in which case the two
ways become the same.

> A. cases where namespace names are identical but the corresponding
> resources are not
>
> B. cases where namespace names are not identical but the corresponding
> resources are
>
> Now type B cases are relatively harmless and an unavoidable fact of
> life, but type A cases are (to some of us anyway) unacceptable.  The
> Microsoft proposal appears to be getting rid of type A mismatches by
> accepting additional type B mismatches.

Case A is definitely evil and yes, is avoided by our proposal. I don't
see why that would lead to more type B mismatches though. I would expect
it to stay the same.

Henrik Frystyk Nielsen,
mailto:frystyk@microsoft.com

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.3
[2] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0504.html
[3] http://www.w3.org/1999/05/WCA-terms/#Resource1
Received on Saturday, 10 June 2000 22:24:29 UTC