Re: A proposed solution from James Clark on 2000-06-15 (xml-uri@w3.org from June 2000)

From: James Clark <jjc@jclark.com>
Date: Thu, 15 Jun 2000 15:08:13 +0700
To: Henrik Frystyk Nielsen <frystyk@microsoft.com>
CC: David Turner <dturner@microsoft.com>, XML-uri@w3.org, Andrew Layman <andrewl@microsoft.com>
Message-ID: <39488EED.50A0D228@jclark.com>
Henrik Frystyk Nielsen wrote:
> 
> > This proposal seems rather vague to me. I don't see anything in this
> > that says precisely when two namespace names are considered equal.
> 
> In fact it does - it says that a namespace identifier is a URI which
> means that at the basic level you compare on a octet by octet manner
> taking into account the context in which any relative URIs are defined.
> 
> The URI spec also defines a set of common syntax equivalence rules for
> the hostname and the default port number etc. but I wouldn't bet that
> applications get those consistently right.
> 
> Furthermore it says that a URI scheme may define further normalization
> rules that can have an impact on how URIs are defined. However, as you
> can never expect that a URI parser knows about the specific scheme you
> use, there is no guarantee that those normalization rules are followed.
> 
> So at the basic level, it is octet-by-octet comparison. If you don't
> think these rules are clear enough then we should amend the URI spec -
> not the namespace spec.

I can't see anything in RFC 2396 that defines when two URI references
are equivalent.  Perhaps you could point me to the section of RFC 2396
that does this.  The reason the namespaces spec explicitly states the
equivalence rules for namespace names is because RFC 2396 doesn't.  For
example, do you absolutize before octet-by-octet comparison or not?   If
you have a URI scheme that uses a server-based naming authority (section
3.2.2), and two URLs use hotsnames that are octet-for-octet distinct but
resolve to the same IP address, are the URLs equivalent.  I don't see
anything in RFC 2396 that provides answers to questions like these.  I
believe it's up to the namespaces spec to specify what the rules are.

> >> Relative URIs are always defined within a context.
> >
> > I thought they were defined relative to a base URI.  Is a context the
> > same thing as a base URI?
> 
> The reason for using the term "context" instead of "base URI" is to make
> it clear that relative URIs in fact can be used within a constrained
> context without actually knowing or using the base URI.

That's not what RFC 2396 says. From section 5.1:

   The term "relative URI" implies that there exists some absolute "base
   URI" against which the relative reference is applied.  Indeed, the
   base URI is necessary to define the semantics of any relative URI
   reference; without it, a relative reference is meaningless.

> As an analogy, I can evaluate the location of stuff in the room where I
> am sitting relative to the floor, the walls, and the ceiling of the room
> without knowing anything about what floor of the building the room is on
> or what city the building is in.
> 
> > If multiple levels of hierarchy count as the same context, then this
> > proposal does not solve the problem. Suppose I have a document
> > http://www.w3.org/a/b referencing an entity c/d which absolutizes to
> > http://www.w3.org/a/c/d.  If these have the same context, then a
> > namespace URI "foo" in the document will be treated as equal to a
> > namespace URI "foo" in the referenced entity despite the fact that it
> > refers to a difference resource after URI absolutization.
> 
> The examples refer to examples of relative URIs - not contexts.

It's an example of two different base URIs within a single hierarchy of
documents.  The document has a base URI of http://www.w3.org/a/b, and
the referenced entity has a base URI of http://www.w3.org/a/c/d.  Does
the document have the same context as the referenced entity or not?

> > A. cases where namespace names are identical but the corresponding
> > resources are not
> >
> > B. cases where namespace names are not identical but the corresponding
> > resources are
> >
> > Now type B cases are relatively harmless and an unavoidable fact of
> > life, but type A cases are (to some of us anyway) unacceptable.  The
> > Microsoft proposal appears to be getting rid of type A mismatches by
> > accepting additional type B mismatches.
> 
> Case A is definitely evil and yes, is avoided by our proposal. I don't
> see why that would lead to more type B mismatches though. I would expect
> it to stay the same.

Suppose you have namespace names "a" and "./a".  These refer to the same
resource, but are not character-for-character identical.  However, if
you absolutize them relative to the same base URI, then they will
resolve to the same URI.

The other case arise when you compare namespace names with different
contexts. Suppose you have a namespace name "a" in a document with base
URI "http://www.w3.org/" and a namespace name "../a" in a document with
base URI "http://www.w3.org/2000/".   There needs to be a clear answer
as to whether these are to be treated as identical or not.  If the
proposal is that they are not identical because they have different base
URIs and so different contexts, then this is an additional type B case.

James
Received on Thursday, 15 June 2000 04:25:36 UTC