Re: Posted draft of URI comparison finding from Tim Bray on 2002-12-02 (www-tag@w3.org from December 2002)

From: Tim Bray <tbray@textuality.com>
Date: Mon, 02 Dec 2002 07:42:07 -0800
To: Dan Connolly <connolly@w3.org>
Cc: WWW-Tag <www-tag@w3.org>
Message-ID: <3DEB7F4F.2080909@textuality.com>

Dan Connolly wrote:

> |Software is commonly required to compare two URIs to determine
> | whether they identify the same resource.
> 
> Really? What software has to do that?

Every web brower in the planet, when it checks the URI you just typed in 
or clicked on against its cache.  If it thinks they are the same 
resource (modulo expiry & so on) it doesn't dereference.  Since you 
obviouly know about this process, I suspect I'm missing your point.

> I find that software almost *never* needs to determine
> whether two URIs determine the same resource.
> 
> In my suggested rewrite of 2.2.1, I wrote:

It seems to me we're saying much the same thing.

> |A resource, in the Web Architecture, is an abstraction
> 
> umm... I'm not sure what you mean by that; some resources
> are quite concrete, no?

We use phrases such as "a time-varying mapping yadda yadda" - yes some 
rsources are very concrete, but the notion of a resource (anything that 
can be identified, per RFC2396) is very abstract.

> |Put another way, it is often possible to determine that two URIs
> |are the same, but it is in general never possible to be sure
> |that they are different.
> 
> it's easy to tell if two URIs are different,

OK, granted, needs editorial work.

> |   1. It is in general not possible to compare relative
> | URI references with any hope of correct results."
> 
> again, you can compare URI references just fine with strcmp()
> It's only when you're interested in what they point to
> that you need to expand them w.r.t. a base.

This whole note is about what to do when you're interested in what they 
point to.  Once again, I'm missing your point?

> |In Unicode terminology, this would be properly referred
> | to as codepoint-for-codepoint comparison.
> 
> Well, it's only codepoint-for-codepoint after you map
> the charcters to codepoints; character-for-character
> is just as proper, no?

A character is a whatnot identified by a number (codepoint); you can use 
the number to look up glyphs and semantics and so on.  I don't know how 
to do character-to-character comparison at all if I don't know what the 
codepoints are.

> | since the Namespaces in XML recommendation specifies
> | "character-for-character" comparison, it might be argued that
> | since %7A and %7a must per RFC2396 represent the same character,
> | XML namespaces which differ only in this respect might reasonably
> | be considered equal.
> 
> absolutely not. Let's make this a test case and make
> it absolutely, perfectly clear

As Misha pointed out at some length, your exegesis may be correct, but 
the namespaces recommendation is NOT "absolutely clear", that's one of 
the reasons we're spending time on this. -Tim

Received on Monday, 2 December 2002 10:42:14 UTC