Re: draft-duerst-iri-07.txt: 2 week mailing list last call from Martin Duerst on 2004-05-12 (public-iri@w3.org from May 2004)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 12 May 2004 18:00:51 +0900
To: Graham Klyne <GK@ninebynine.org>, public-iri@w3.org
Message-Id: <4.2.0.58.J.20040512173603.03bdea90@localhost>

Hello Graham,

I have made this issue charcompareMUST-31.


At 12:02 04/05/10 +0100, Graham Klyne wrote:


>Section 5.1:
>
>[[
>5.1  Simple String Comparison
>
>    In some scenarios a definite answer to the question of IRI
>    equivalence is needed that is independent of the scheme used and
>    always can be calculated quickly and without accessing a network. An
>    example of such a case is XML Namespaces ([XMLNamespace]). In such
>    cases, two IRIs SHOULD be defined as equivalent if and only if they
>    are character-by-character equivalent. This is the same as being
>    byte-by-byte equivalent if the character encoding for both IRIs is
>    the same. As an example,
>    http://example.org/~user, http://example.org/%7euser, and
>    http://example.org/%7Euser are not equivalent under this definition.
>    In such a case, the comparison function MUST NOT map IRIs to URIs,
>    because such a mapping would create additional spurious equivalences.
>]]
>
>It's not clear to me what the MUST NOT here is saying.  Making normative 
>statements that are conditional on some postulated application scenario 
>seems to be a bit confusing to me.

If you interpreted the statement as conditional on some application
scenario, then it is indeed confusing. It was intended conditional
to the comparison function. I.e. if you use character-by-character
comparison, you MUST NOT map IRIs to URIs,
because such a mapping would create additional spurious equivalences.

I have replaced "In such a case" with "When comparing character-by-character".


>I think the final sentence maybe should be:
>[[
>The IRI to URI mapping function described above [ref] does not preserve 
>this form of equivalence.
>]]
>
>(Further, the MUST NOT here seems even more perverse in light of the 
>introductory material in section 3.1)

I have checked that material again, and did not find any problems.
You may observe that that material is carefully worded in terms of
retrieval when it comes to IRI->URI mapping, not in terms of
abstract resource identification.


>I suspect there should be some discouragement of applications depending on 
>this level of equivalence, in view of the spurious distinctions that are 
>lost when IRIs are converted to URIs.   To my mind the string equivalence 
>of the URI-converted form seems like the lowest reasonable level of 
>distinction to be encouraged.

Well, there are some serious arguments against this:
- Some very important applications, in particular XML Namespaces
   and RDF, use this equivalence. So recommendation against this
   would cause confusion.
- Needing to convert to URIs for every comparison is inefficient
   (that was the main argument for namespaces)
- Needing to convert to URIs may lead to more URIs (rather than IRIs)
   floating around, because in some cases, the conversion would
   leak.
So that's why we should not go there.

I hope the above addresses your concerns.

Regards,    Martin.

Received on Wednesday, 12 May 2004 05:27:39 UTC