- From: Dan Connolly <connolly@w3.org>
- Date: 02 Dec 2002 08:42:56 -0600
- To: Tim Bray <tbray@textuality.com>
- Cc: WWW-Tag <www-tag@w3.org>
On Fri, 2002-11-29 at 02:13, Tim Bray wrote: > > I just posted, at http://www.textuality.com/tag/uri-comp.html, a first > cut at some finding language in comparing URIs. |Software is commonly required to compare two URIs to determine | whether they identify the same resource. Really? What software has to do that? I find that software almost *never* needs to determine whether two URIs determine the same resource. In my suggested rewrite of 2.2.1, I wrote: [[ The problem of determining whether two different absolute URI references refer to the same resource or not is, in the general case, arbitrarily hard. Fortunately, the problem does not need a complete nor ubiquitously deployed solution in order for the Web to operate usefully. Approaches to the problem include avoiding the problem, formal approaches, and heuristic approaches: ]] -- http://lists.w3.org/Archives/Public/www-tag/2002Sep/0050.html and then enumerated a number of ways of avoiding the problem of determining whether two URIs identify the same resource. |A resource, in the Web Architecture, is an abstraction umm... I'm not sure what you mean by that; some resources are quite concrete, no? |Put another way, it is often possible to determine that two URIs |are the same, but it is in general never possible to be sure |that they are different. it's easy to tell if two URIs are different, even in the general case: just use strcmp(). What's hard is to tell whether they refer to the same resource. Let's be careful about these use/mention bugs in this finding. | 1. It is in general not possible to compare relative | URI references with any hope of correct results." again, you can compare URI references just fine with strcmp(). It's only when you're interested in what they point to that you need to expand them w.r.t. a base. |In Unicode terminology, this would be properly referred | to as codepoint-for-codepoint comparison. Well, it's only codepoint-for-codepoint after you map the charcters to codepoints; character-for-character is just as proper, no? | since the Namespaces in XML recommendation specifies | "character-for-character" comparison, it might be argued that | since %7A and %7a must per RFC2396 represent the same character, | XML namespaces which differ only in this respect might reasonably | be considered equal. absolutely not. Let's make this a test case and make it absolutely, perfectly clear: in the document <aDoc xmlns="http://example/%7e"/> the namespace name is an 18 character string; the last three characters are %, 7, and e. In the document <aDoc xmlns="http://example/%7E"/> the namespace name is also 18 characters long; the last three characters are %, 7, and E. The last characters of these strings are different, and hence they are different strings. > I'm in Narita running > for a plane so this got less proofreading than I usually have time for. > > The subject expands remarkably once you start writing it all down. -Tim -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Monday, 2 December 2002 09:42:49 UTC