- From: <Misha.Wolf@reuters.com>
- Date: Mon, 29 Apr 2002 16:00:13 +0100
- To: Norman Walsh <Norman.Walsh@Sun.COM>
- Cc: www-tag@w3.org
On 29/04/2002 14:59:38 Norman Walsh wrote: > As I've said before, I think the only practical URI comparison > algorithm is "lexicographic identity". But every time I've said that, > I've had this nagging concern about how to deal with characters that > might or might not be escaped. > > There's an erratum to XML[1] that tries to tackle a related issue, but > the IRI draft[2] (I haven't read it with great care, so I could be > overlooking something) seems to provide a more complete algorithm for > "normalizing" the possibly escaped characters in a URI. > > I'm now inclined to say that the right way to compare URIs is to turn > them into IRIs and test their lexicographic identity. In general, one can't reliably transform a URI into an IRI, as the character encoding is not known. So I suspect that your initial inclination was correct. IRIs do come into the picture in two respects: 1. The rule for identity should address IRIs as well as URIs. Indeed, as every URI is by definition an IRI, it need only address IRIs :-) 2. Because lexicographic identity generally results in an IRI not matching the corresponding URI, it is important that specs, designers and software follow this rule (given in IRI draft): | 2.3.1 When to convert from IRIs to URIs | | The mapping from IRIs to URIs SHOULD only be applied when necessary, | and as late as possible. This agrees with the XML spec[3]: | Since escaping is not always a fully reversible process, it must be | performed only when absolutely necessary and as late as possible in a | processing chain. In particular, neither the process of converting a | relative URI to an absolute one nor the process of passing a URI | reference to a process or software component responsible for | dereferencing it should trigger escaping. > [1] http://www.w3.org/XML/xml-V10-2e-errata#E4 > [2] http://www.ietf.org/internet-drafts/draft-duerst-iri-00.txt [3] http://www.w3.org/XML/xml-V10-2e-errata#E26 Misha -------------------------------------------------------------- -- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Monday, 29 April 2002 11:02:13 UTC