RE: Secion 6 Normalization and Comparison from Misha Wolf on 2003-04-28 (uri@w3.org from April 2003)

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Mon, 28 Apr 2003 11:42:16 +0100
To: "Roy T. Fielding" <fielding@apache.org>, "Williams, Stuart" <skw@hplb.hpl.hp.com>
Cc: uri@w3.org
Message-ID: <T61e07c8aaec407b711b30@dtcseuvig5.dtc.lon.ime.reuters.com>

My point about entities and numeric char references applies 
here too.

Misha


> -----Original Message-----
> From: Roy T. Fielding [mailto:fielding@apache.org] 
> Sent: 28 April 2003 11:43
> To: Williams, Stuart
> Cc: uri@w3.org
> Subject: Re: Secion 6 Normalization and Comparison
> 
> 
> 
> >> Yes, they are always equivalent.  They won't necessarily be
> >> the same for comparison, but they are equivalent (which means
> >> applications can replace one with the other if they so desire).
> >
> > Oh...! The Namespaces 1.1 CR [1] gives the following 
> example (well yes,
> > expressed in IRI rather than URI terms):
> >
> > "The IRI references below are also all different for the purposes of
> > identifying namespaces:
> > ...
> >   http://www.example.org/~wilbur
> >   http://www.example.org/%7ewilbur
> >   http://www.example.org/%7Ewilbur
> > "
> >
> > Which I read as making these three identifiers *not* 
> equivalent for the
> > purpose of naming a namespace.
> >
> > [1] http://www.w3.org/TR/xml-names11/#IRIComparison
> 
> The Namespaces CR is welcome to choose CDATA comparison over URI 
> comparison,
> but it has no choice in regards to URI equivalence.  It cannot claim 
> they
> are different -- it can only claim that they are 
> inconsistently written.
> 
> BTW, there is no reason for the Namespaces specification to include
> the quoted text above -- they are over-specifying the protocol.  What
> they should say is that identifiers are assumed to be in normal form
> and are not normalized for consistency prior to comparison.
> 
> >>> Also, in general it is not clear to me that it is legitimate to
> >>> unescape the escape sequence, because in general one 
> doesn't know the
> > character set
> >>> of the escaped character - only authority that minted the 
> URI knows 
> >>> that
> > -
> >>> looking at a URI you only get to know what octet was escaped. [I 
> >>> think].
> >>
> >> That doesn't matter because the octet remains the same
> >> whether it is escaped or not.  The escaping merely prevents
> >> characters from being misinterpreted as delimiters of
> >> components or of the URI itself.
> >
> > I agree, it's of no consequence for octet based comparison 
> (as in [2] 
> > URI
> > Characters seq->octet seq->Original Character seq).
> >
> > *If* the document were to say very clearly that URI 
> comparisons should 
> > be
> > based on comparing octet sequences, at least for me, that would 
> > explain your
> > response above - ~, %7e, %7E all contribute the same to an octet 
> > sequence.
> 
> That is mixing normalization with comparison.  The document 
> doesn't say
> that because it isn't usually necessary -- URIs are often 
> compared with 
> the
> assumption that they are already in normal form.  That's the 
> whole point
> of the additions for section 6.
> 
> ....Roy
> 
> 


--------------------------------------------------------------- -
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

Received on Monday, 28 April 2003 06:43:46 UTC