W3C home > Mailing lists > Public > uri@w3.org > August 2004

Re: Helping out with canonicalization of URIs

From: Graham Klyne <GK@ninebynine.org>
Date: Mon, 09 Aug 2004 16:39:14 +0100
Message-Id: <5.1.0.14.2.20040809161019.00b8c930@127.0.0.1>
To: Sam Ruby <rubys@intertwingly.net>, uri@w3.org
Cc: Atom WG <atom-syntax@imc.org>

At 08:54 09/08/04 -0400, Sam Ruby wrote:
>These rules completely cover scheme, path, and partially cover authority.

>Here are some URIs that I can't determine if they are in canonical form 
>based solely on the rules listed in rfc2396-bis:

I'll offer here my opinions based on my understanding gained from 
implementing a parser directly from a slightly earlier version of 
RFC2396bis.  (That is, having read and worked with the specification, but 
without recalling specific assertions in each case.)

>   http://:@example.com/

I'd say that's different from http://example.com/, in that it contains 
empty username/password values, which the latter does not.  For example, 
following the exhortation not to expose passwords, my software would (by 
default) display this as:
   http://:********@example.com/
whereas the other would be displayed unchanged.

(I'm not claiming this is a *useful* distinction, but lacking any text that 
says a null username/password is the same as having no username/password, 
I'd say that it does exist.)

>   http://example.com:80/

I think this is the same as http://example.com/ according to RFC2396bis, 
but that you have to climb someway up the equivalence ladder 
(protocol-specific equivalence) to recognize this.  It must be expected 
that many software packages would not recognize this equivalence.

(I'm not a great fan of "the ladder" approach, but I don't have anything 
better to offer...)

>   http://example.com/gateway.cgi?

I'd say this is distinct from http://example.com/gateway.cgi? -- an empty 
query is not the same as no query at all.

>   http://www.w3.org/2000/01/rdf-schema#

I'd say this is distinct from http://www.w3.org/2000/01/rdf-schema -- an 
empty fragment is not the same as no fragment at all (note, when used as 
namespace URI in an RDF document, they certainly would not give rise to the 
same resource identifiers according to the RDF specifications -- see RDF 
syntax spec (10 Feb 2004), section 6.1.2, URI accessor)

>My initial inclination would be to declare all of these as non-canonical, 
>but there is enough common practice of the last example that it probably 
>should be an exception.

As you see, I come to different conclusions in most cases.

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Monday, 9 August 2004 15:53:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:08 UTC