Re: URIEquivalence-15

On 29/04/2002 14:59:38 Norman Walsh wrote:
> As I've said before, I think the only practical URI comparison
> algorithm is "lexicographic identity". But every time I've said that,
> I've had this nagging concern about how to deal with characters that
> might or might not be escaped.
>
> There's an erratum to XML[1] that tries to tackle a related issue, but
> the IRI draft[2] (I haven't read it with great care, so I could be
> overlooking something) seems to provide a more complete algorithm for
> "normalizing" the possibly escaped characters in a URI.
>
> I'm now inclined to say that the right way to compare URIs is to turn
> them into IRIs and test their lexicographic identity.

In general, one can't reliably transform a URI into an IRI, as the
character encoding is not known.  So I suspect that your initial
inclination was correct.  IRIs do come into the picture in two respects:

1.  The rule for identity should address IRIs as well as URIs.  Indeed,
    as every URI is by definition an IRI, it need only address IRIs :-)

2.  Because lexicographic identity generally results in an IRI not
    matching the corresponding URI, it is important that specs,
    designers and software follow this rule (given in IRI draft):

| 2.3.1 When to convert from IRIs to URIs
|
|   The mapping from IRIs to URIs SHOULD only be applied when necessary,
|   and as late as possible.

This agrees with the XML spec[3]:

| Since escaping is not always a fully reversible process, it must be
| performed only when absolutely necessary and as late as possible in a
| processing chain. In particular, neither the process of converting a
| relative URI to an absolute one nor the process of passing a URI
| reference to a process or software component responsible for
| dereferencing it should trigger escaping.

> [1] http://www.w3.org/XML/xml-V10-2e-errata#E4
> [2] http://www.ietf.org/internet-drafts/draft-duerst-iri-00.txt

[3] http://www.w3.org/XML/xml-V10-2e-errata#E26

Misha





-------------------------------------------------------------- --
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

Received on Monday, 29 April 2002 11:02:13 UTC