W3C home > Mailing lists > Public > xml-uri@w3.org > May 2000

AW: When are two URIs equivalent?

From: Josef Dietl <josef@mozquito.com>
Date: Tue, 23 May 2000 17:00:44 +0200
To: "Tim Berners-Lee" <timbl@w3.org>, "Eric van der Vlist" <vdv@dyomedea.com>
Cc: <xml-uri@w3.org>
Hi Tim!

Thank you very much for the exhaustive reply. I hope your reply has shed
some mroe light on the framework we work in. While I do agree that it
doesn't matter which resources the compared URIs point to, I do not agree
with the core of your answer:

> Two URIs are the same when they compare character for
> character.

First, let me appologize for coming after you: everybody else would have
gotten pretty much the same response, independent of origin :-/

The response is: This is not when two URIs are equivalent - this is when
you _think_ they are equivalent (see below). The topic appears a little bit
more convoluted to me, for two reasons.

First, there is the question of when to do the character by character
comparison: Before or after absolutizing. That's the discussion we are
currently having, but there is more to it:

Second, the URI spec explicitly states that "Unlike many specifications
that use a BNF-like grammar to define the bytes (octets) allowed by a
protocol, the URI grammar is defined in terms of characters." And learning
from W3C's Internationalization Activity, comparing characters is not quite
as easy, take the German u-Umlaut "ü" - even within Unicode, there are at
least to representations of the same "character" (the "real" character and
the double strike equivalent to >>"-backspace-u<<).

Please excuse the lawyer-minded stick-to-the-letter mind in the next
paragraphs, trust me: it does provide a nice way out:

The best answer I found to the question of URI equivalence is section 6 of
the URI RFC:

-------from http://www.ietf.org/rfc/rfc2396.txt?number=2396
6. URI Normalization and Equivalence

   In many cases, different URI strings may actually identify the
   identical resource. For example, the host names used in URL are
   actually case insensitive, and the URL <http://www.XEROX.com> is
   equivalent to <http://www.xerox.com>. In general, the rules for
   equivalence and definition of a normal form, if any, are scheme
   dependent. When a scheme uses elements of the common syntax, it will
   also use the common syntax equivalence rules, namely that the scheme
   and hostname are case insensitive and a URL with an explicit ":port",
   where the port is the default for the scheme, is equivalent to one
   where the port is elided.

So, looks to me like the answer to the question in the subject is: We don't
know, it depends on the scheme used. This lead me to the thought: How are
we supposed to do "identifiers" if we can't compare them?

If the above statement ("we don't know.") is true, that's a gap we can
close while solving many problems associated with namespaces and relative
URIs. We could declare the current state a necessary fix until the full
solution is available. That way, we can at least get away from stepping on
every single toe and move forward. The idea is: solve the problem at hand
without touching namespaced, DOM, XSLT, XPath or RFC 2396 by stating "we
have a better solution for a small point for all of these" - which would
also lead to more consistency for the future because we have a separate
document we can point to.

Even more: if W3C does it, I know that document has a persistent URI :-)

Despite lack of documented qualification, I'm certainly willing to donate
some of my time to this project to move things forward.

Otherwise, hopefully the thought was useful in its own right.


P.S.: You may have realized that I'm really trying to stay in scope here.
I'm aware of the numerous and big issues around namespaces and URIs (some
jointly, some individually :-). Just take one step at a time.

> -----Ursprüngliche Nachricht-----
> Von: xml-uri-request@w3.org
> [mailto:xml-uri-request@w3.org]Im Auftrag
> von Tim Berners-Lee
> Gesendet: Dienstag, 23. Mai 2000 15:43
> An: Josef Dietl; Eric van der Vlist
> Cc: xml-uri@w3.org
> Betreff: Re: When are two URIs equivalent?
> -----Original Message-----
> From: Josef Dietl <josef@mozquito.com>
> To: Eric van der Vlist <vdv@dyomedea.com>
> Cc: Tim Berners-Lee <timbl@w3.org>; xml-uri@w3.org
> <xml-uri@w3.org>
> Date: Tuesday, May 23, 2000 6:03 AM
> Subject: When are two URIs equivalent?
> >Eric,
> >
> >ok, you caught me - I meant to say "scheme".
> >
> >Still: would somebody mind telling me when two URIs
> are the same?
> A URI is a (syntax constrained) string used to
> identify something.
> A resource is that which, being in general abstract,
> is identified by the
> URI.
> Two URIs are the same when they compare character for
> character.
> When two URIs are the same they identify the same resource.
> (NB. There are many cases in which the resources
> identified by two
> different URIs are the same. Software is not required to know
> all (or for xml well-formedness checking, any)  these
> cases. They include
> knowledge that the hex %nn encoding for non-reserved charecter
> is an arbitrary choice; the knowledge that if the
> scheme is HTTP or
> FTP then the domian name part is not case sensistive.
> They include
> information obtained from a name server returning a
> "Found" response
> to an HTTP request. They include metata gained from a
> third party.
> There is no defiitive list of these. Some of them are
> a function of the
> URI scheme.)
>                          u1 = u2  =>  R1 = R2
> but not the reverse implication does not hold.  On the
> left hand side,
> "=" means string equality; on the right hand side, "="
> stands for
> equivalence for any operation at all.
> For example, u1 may be the absolutized URI from the
> namespace name
> of a namespace in an xml document, in which case R1 is
> the namespace.
> u2 may be the absolutized URi from an XSLT style sheet, and R2
> the namespace which h stylesheet is giving a
> particular style to.
> The name for a string returned in the body of a
> successful HTTP GET request
> is an HTTP "entity body".
> (Yes, the XML, HTTP and URI communities have to learn
> a certian amount
> of each others' jargon).
> Tim BL
Received on Tuesday, 23 May 2000 10:51:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:13:58 UTC