Re: Helping out with canonicalization of URIs

On Monday, August 9, 2004, at 08:39  AM, Graham Klyne wrote:
> At 08:54 09/08/04 -0400, Sam Ruby wrote:
>> These rules completely cover scheme, path, and partially cover 
>> authority.
>
>> Here are some URIs that I can't determine if they are in canonical 
>> form based solely on the rules listed in rfc2396-bis:
>
> I'll offer here my opinions based on my understanding gained from 
> implementing a parser directly from a slightly earlier version of 
> RFC2396bis.  (That is, having read and worked with the specification, 
> but without recalling specific assertions in each case.)
>
>>   http://:@example.com/
>
> I'd say that's different from http://example.com/, in that it contains 
> empty username/password values, which the latter does not.  For 
> example, following the exhortation not to expose passwords, my 
> software would (by default) display this as:
>   http://:********@example.com/
> whereas the other would be displayed unchanged.
>
> (I'm not claiming this is a *useful* distinction, but lacking any text 
> that says a null username/password is the same as having no 
> username/password, I'd say that it does exist.)

Yes, and it is a useful distinction because it defines how the user
agent should respond to an initial authentication request, whereas
without the colon the user agent is not supposed to try authenticating
on its own.

>>   http://example.com:80/
>
> I think this is the same as http://example.com/ according to 
> RFC2396bis, but that you have to climb someway up the equivalence 
> ladder (protocol-specific equivalence) to recognize this.  It must be 
> expected that many software packages would not recognize this 
> equivalence.

It is the same, but that is under scheme-specific normalization, not
protocol.  It is the "http" that defines :80/none equivalence, not HTTP.

>>   http://example.com/gateway.cgi?
>
> I'd say this is distinct from http://example.com/gateway.cgi? -- an 
> empty query is not the same as no query at all.

Yes.

>>   http://www.w3.org/2000/01/rdf-schema#
>
> I'd say this is distinct from http://www.w3.org/2000/01/rdf-schema -- 
> an empty fragment is not the same as no fragment at all (note, when 
> used as namespace URI in an RDF document, they certainly would not 
> give rise to the same resource identifiers according to the RDF 
> specifications -- see RDF syntax spec (10 Feb 2004), section 6.1.2, 
> URI accessor)

Yes.

>> My initial inclination would be to declare all of these as 
>> non-canonical, but there is enough common practice of the last 
>> example that it probably should be an exception.
>
> As you see, I come to different conclusions in most cases.

Right, the only thing it might make sense to add is a bullet explicitly
restating what is already said about an empty port in 6.2.3.  However,
this is not a conformance issue since all normalization is optional.

....Roy

Received on Monday, 9 August 2004 18:26:14 UTC