Re: When are two URIs equivalent? from keshlam@us.ibm.com on 2000-05-23 (xml-uri@w3.org from May 2000)

From: <keshlam@us.ibm.com>
Date: Tue, 23 May 2000 11:28:58 -0400
To: John Cowan <jcowan@reutershealth.com>
cc: Josef Dietl <josef@mozquito.com>, "xml-uri@w3.org" <xml-uri@w3.org>
Message-ID: <852568E8.00550A9E.00@D51MTA03.pok.ibm.com>

>Char-by-char equivalence is too weak for URIs.
> RFC 2396 resolution tells us how to convert relative URIs to absolute,
> which can then be compared char-by-char.

You're confusing URIs and URI References.

Absolutizing deals with converting a URI Reference into a URI.

Char-by-char is fine for URIs, _if_ you ignore embedded-relative and
character-escaping issues.

If you want to deal with those additional points, you need to Canonicalize
the URI. This is starting to get beyond the definition of URIs; the URI
spec mentions canonicalization but says that this process is unique to each
URI Scheme... and there's no bound on how many schemes can be invented, so
this is generally handled on the server side of things. As far as I know
there's no way to ask a server how it would canonicalize a URI even if you
are willing to do a network transaction.

The question of how the server maps Canonicalized  URIs into responses is
yet another layer of interpretation, of course. But that really is beyond
the scope of the URI spec.

______________________________________
Joe Kesselman  / IBM Research

Received on Tuesday, 23 May 2000 11:29:34 UTC