W3C home > Mailing lists > Public > uri@w3.org > April 2003

Re: Secion 6 Normalization and Comparison

From: Roy T. Fielding <fielding@apache.org>
Date: Mon, 28 Apr 2003 03:42:37 -0700
Cc: uri@w3.org
To: "Williams, Stuart" <skw@hplb.hpl.hp.com>
Message-Id: <20B27819-7966-11D7-99B7-000393753936@apache.org>

>> Yes, they are always equivalent.  They won't necessarily be
>> the same for comparison, but they are equivalent (which means
>> applications can replace one with the other if they so desire).
> Oh...! The Namespaces 1.1 CR [1] gives the following example (well yes,
> expressed in IRI rather than URI terms):
> "The IRI references below are also all different for the purposes of
> identifying namespaces:
> ...
>   http://www.example.org/~wilbur
>   http://www.example.org/%7ewilbur
>   http://www.example.org/%7Ewilbur
> "
> Which I read as making these three identifiers *not* equivalent for the
> purpose of naming a namespace.
> [1] http://www.w3.org/TR/xml-names11/#IRIComparison

The Namespaces CR is welcome to choose CDATA comparison over URI 
but it has no choice in regards to URI equivalence.  It cannot claim 
are different -- it can only claim that they are inconsistently written.

BTW, there is no reason for the Namespaces specification to include
the quoted text above -- they are over-specifying the protocol.  What
they should say is that identifiers are assumed to be in normal form
and are not normalized for consistency prior to comparison.

>>> Also, in general it is not clear to me that it is legitimate to
>>> unescape the escape sequence, because in general one doesn't know the
> character set
>>> of the escaped character - only authority that minted the URI knows 
>>> that
> -
>>> looking at a URI you only get to know what octet was escaped. [I 
>>> think].
>> That doesn't matter because the octet remains the same
>> whether it is escaped or not.  The escaping merely prevents
>> characters from being misinterpreted as delimiters of
>> components or of the URI itself.
> I agree, it's of no consequence for octet based comparison (as in [2] 
> Characters seq->octet seq->Original Character seq).
> *If* the document were to say very clearly that URI comparisons should 
> be
> based on comparing octet sequences, at least for me, that would 
> explain your
> response above - ~, %7e, %7E all contribute the same to an octet 
> sequence.

That is mixing normalization with comparison.  The document doesn't say
that because it isn't usually necessary -- URIs are often compared with 
assumption that they are already in normal form.  That's the whole point
of the additions for section 6.

Received on Monday, 28 April 2003 06:40:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:05 UTC