Re: Posted draft of URI comparison finding

On Fri, 2002-11-29 at 02:13, Tim Bray wrote:
> 
> I just posted, at http://www.textuality.com/tag/uri-comp.html, a first 
> cut at some finding language in comparing URIs.

|Software is commonly required to compare two URIs to determine
| whether they identify the same resource.

Really? What software has to do that?
I find that software almost *never* needs to determine
whether two URIs determine the same resource.

In my suggested rewrite of 2.2.1, I wrote:

[[
The problem of
determining whether two different absolute URI references refer to the
same resource or not is, in the general case, arbitrarily hard.

Fortunately, the problem does not need a complete nor ubiquitously
deployed solution in order for the Web to operate usefully. Approaches
to the problem include avoiding the problem, formal approaches,
and heuristic approaches:
]]
 -- http://lists.w3.org/Archives/Public/www-tag/2002Sep/0050.html

and then enumerated a number of ways of avoiding the problem
of determining whether two URIs identify the same resource.


|A resource, in the Web Architecture, is an abstraction

umm... I'm not sure what you mean by that; some resources
are quite concrete, no?

|Put another way, it is often possible to determine that two URIs
|are the same, but it is in general never possible to be sure
|that they are different.

it's easy to tell if two URIs are different, even in the general
case: just use strcmp(). What's hard is to tell whether
they refer to the same resource. Let's be careful about these
use/mention bugs in this finding.

|   1. It is in general not possible to compare relative
| URI references with any hope of correct results."

again, you can compare URI references just fine with strcmp().
It's only when you're interested in what they point to
that you need to expand them w.r.t. a base.

|In Unicode terminology, this would be properly referred
| to as codepoint-for-codepoint comparison.

Well, it's only codepoint-for-codepoint after you map
the charcters to codepoints; character-for-character
is just as proper, no?

| since the Namespaces in XML recommendation specifies
| "character-for-character" comparison, it might be argued that
| since %7A and %7a must per RFC2396 represent the same character,
| XML namespaces which differ only in this respect might reasonably
| be considered equal.

absolutely not. Let's make this a test case and make
it absolutely, perfectly clear: in the document

	<aDoc xmlns="http://example/%7e"/>

the namespace name is an 18 character string; the last
three characters are %, 7, and e. In the document

	<aDoc xmlns="http://example/%7E"/>

the namespace name is also 18 characters long; the
last three characters are %, 7, and E.
The last characters of these strings are different,
and hence they are different strings.



>  I'm in Narita running 
> for a plane so this got less proofreading than I usually have time for.
> 
> The subject expands remarkably once you start writing it all down. -Tim

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Monday, 2 December 2002 09:42:49 UTC