W3C home > Mailing lists > Public > www-tag@w3.org > July 2002

Re: URIEquivalence-15 and IRIs

From: Martin Duerst <duerst@w3.org>
Date: Wed, 10 Jul 2002 10:59:36 +0900
Message-Id: <4.2.0.58.J.20020710103437.0555c6e8@localhost>
To: Tim Bray <tbray@textuality.com>, Misha.Wolf@reuters.com
Cc: w3c-i18n-ig@w3.org, www-tag@w3.org

At 11:37 02/07/09 -0700, Tim Bray wrote:
>Misha.Wolf@reuters.com wrote:
>
>>I think the IRI spec [1] should state explicitly that by "character-by-
>>character equivalent" we mean that all of these (taken from a para a bit
>>further on) are different:
>>-  foo://example.com/XML
>>-  foo://example.com/XM%4C
>>-  foo://example.com/XM%4c
>>After all, the Namespaces spec [2] states that:
>>    [Definition:] URI references which identify namespaces are considered
>>    identical when they are exactly the same character-for-character.
>>and there has been discussion of what exactly this means.  Just repeating
>>it won't, IMO, clear up the confusion.
>
>OK, is the option open to us of deciding that the Namespaces spec, by 
>"character-for-character", really meant that the latter of the two above 
>must always be treated as equal, and furthermore equal to the first 
>because in this case the hex-escaped char 'L' is in the "safe" set?

 From what I have seen in the discussion up to now, the answer seems to
be 'in theory yes, in practice no'.


>Second question, if we could do this, should we?

I think the 'in practice no' above answers this.


>I.e. do people feel that these really are effectively always the same URI 
>in all possible sets of circumstances?  Seems that way to me.  -Tim

They indeed always have to be resolved to the same resource.
I have not found any spec that would say anything different,
nor have I found any individual who would claim anything different.
I'm not 100% sure that all implementations (e.g. HTTP servers)
respect it, but I haven't seen anything to the contrary yet.

For the IRI draft
(http://www.ietf.org/internet-drafts/draft-duerst-iri-01.txt),
we have explicitly clarified this with the following:

     For actual resolution, differences in escaping (except for the
     escaping of reserved characters) MUST always result in the same
     resource.  For example, foo://example.com/XML, foo://example.com/
     XM%4C, and foo://example.com/XM%4c must resolve to the same resource.
     If this kind of equivalence is to be tested, the escaping of both
     IRIs to be compared has to be aligned, for example by converting both
     IRIs to URIs (see Section 3.1) and making sure that the case of the
     hexadecimal characters in the %-escape is always the same.  Such
     conversions MUST only be done on the fly, without changing the
     original IRI.


Just before that, there is also a sentence saying (for the namespace
case):

     It follows from the above that IRIs SHOULD NOT be modified when being
     transported.

Which is equivalent to what I proposed in a slightly earlier mail,
namely that when you copy namespace URIs/IRIs, you don't touch then
in any way.

Regards,    Martin.
Received on Thursday, 11 July 2002 20:21:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:09 GMT