Re: draft-duerst-iri-07.txt: 2 week mailing list last call from Graham Klyne on 2004-05-12 (public-iri@w3.org from May 2004)

From: Graham Klyne <gk@ninebynine.org>
Date: Wed, 12 May 2004 12:44:03 +0100
To: Martin Duerst <duerst@w3.org>, public-iri@w3.org
Message-Id: <5.1.0.14.2.20040512122534.02bf4b80@127.0.0.1>
At 18:00 12/05/04 +0900, Martin Duerst wrote:
>Hello Graham,
>
>I have made this issue charcompareMUST-31.
>
>
>At 12:02 04/05/10 +0100, Graham Klyne wrote:
>
>
>>Section 5.1:
>>
>>[[
>>5.1  Simple String Comparison
>>
>>    In some scenarios a definite answer to the question of IRI
>>    equivalence is needed that is independent of the scheme used and
>>    always can be calculated quickly and without accessing a network. An
>>    example of such a case is XML Namespaces ([XMLNamespace]). In such
>>    cases, two IRIs SHOULD be defined as equivalent if and only if they
>>    are character-by-character equivalent. This is the same as being
>>    byte-by-byte equivalent if the character encoding for both IRIs is
>>    the same. As an example,
>>    http://example.org/~user, http://example.org/%7euser, and
>>    http://example.org/%7Euser are not equivalent under this definition.
>>    In such a case, the comparison function MUST NOT map IRIs to URIs,
>>    because such a mapping would create additional spurious equivalences.
>>]]
>>
>>It's not clear to me what the MUST NOT here is saying.  Making normative 
>>statements that are conditional on some postulated application scenario 
>>seems to be a bit confusing to me.
>
>If you interpreted the statement as conditional on some application
>scenario, then it is indeed confusing. It was intended conditional
>to the comparison function. I.e. if you use character-by-character
>comparison, you MUST NOT map IRIs to URIs,
>because such a mapping would create additional spurious equivalences.

I was taking the choice of comparison function to be part of the 
application scenario.

>I have replaced "In such a case" with "When comparing character-by-character".

I think that's better, though it doesn't quite capture my original 
comment.  (Consider:  as this is given as a normative statement, how do you 
propose to find interoperable implementations to demonstrate conformance 
when moving to Draft Standard?  I still prefer my suggestion (below), 
but  now I've raised the issue I'm happy for you to decide.

>>I think the final sentence maybe should be:
>>[[
>>The IRI to URI mapping function described above [ref] does not preserve 
>>this form of equivalence.
>>]]


>>(Further, the MUST NOT here seems even more perverse in light of the 
>>introductory material in section 3.1)
>
>I have checked that material again, and did not find any problems.
>You may observe that that material is carefully worded in terms of
>retrieval when it comes to IRI->URI mapping, not in terms of
>abstract resource identification.

OK, ignore that last comment.  (I wasn't specifically thinking about 
abstract identification.)

But I note that it's not obvious to me that start of section 3.1 is subject 
to the mention of "resource retrieval" that appears in section 
3[.0].  Indeed the fact that the material in 3.1 is also said to apply to 
references and fragment identifiers suggests otherwise.

Checking for scheme-specific syntax restrictions does not seem to be 
specifically related to resource retrieval.  (cf. URN syntax checking.)

Looking more closely at point (b) in 3.1, which clearly *is* about resource 
retrieval, I find myself having further qualms:
[[
However, when an IRI is used for resource
retrieval, the resource that the IRI locates is the same as the
one located by the URI obtained after converting the IRI according
to the procedure defined here. This means that there is no need to
define resolution separately on the IRI level.
]]

This seems to preclude the possibility of defining a resolution protocol 
that uses IRIs natively.  Effectively, this is an imposition on any future 
protocol specification that can be used to resolve IRIs, which seems like a 
rather broad sweep.  Maybe this is OK, and really is what was intended, but 
I feel compelled to at least mention the point.   If this is what you 
intend, I think the point would usefully be more prominent in the text, and 
should be made a normative assertion;  e.g. a top-level paragraph ala:
[[
When an IRI is used for resource retrieval, >>it must be by means of a 
protocol that
can also be used with URIs, and<< the resource that the IRI locates MUST be 
the same as
the one located by the URI obtained after converting the IRI according to the
procedure defined here.
]]

It might be argued that the text between >> and << is redundant, to the 
extent that any URI is also a valid IRI.  (But, thinking aloud, ... suppose 
I wanted to invent a new IRI scheme and protocol to serve as a kind of 
Chinese WordNet, with definitions retrievable in much the same way as they 
are for WordNet.  (Notwithstanding that this may not be a good idea for 
other reasons.)  In such a scheme, maybe there is a component which, 
according to the IRI scheme specification, must contain Chinese character 
symbol(s), so there are no URIs that are valid IRIs according to this 
scheme.  I don't know where this leads.  My main point is to try and raise 
a vaguely plausible scenario in which existence of a URI form for resource 
retrieval may be undesirable.)


>>I suspect there should be some discouragement of applications depending 
>>on this level of equivalence, in view of the spurious distinctions that 
>>are lost when IRIs are converted to URIs.   To my mind the string 
>>equivalence of the URI-converted form seems like the lowest reasonable 
>>level of distinction to be encouraged.
>
>Well, there are some serious arguments against this:
>- Some very important applications, in particular XML Namespaces
>   and RDF, use this equivalence. So recommendation against this
>   would cause confusion.
>- Needing to convert to URIs for every comparison is inefficient
>   (that was the main argument for namespaces)
>- Needing to convert to URIs may lead to more URIs (rather than IRIs)
>   floating around, because in some cases, the conversion would
>   leak.
>So that's why we should not go there.

But these "important applications" are defined in terms of URIs, not 
IRIs.  I'm not suggesting that one should be required to convert to URIs 
for every comparison, but that it might be discouraged to rely on 
differences between IRIs that are not present on conversion to URIs.

I note that your document specifically makes reference to conversion to 
URIs being (notionally) used for a number of purpose, so in this respect 
IRIs are not something whose existence is independent of URIs, and to that 
extent I think to gloss over problems that might arise when conversion to 
URIs is performed may leave room for problems.

Please note that the general thrust of my comments is not to request any 
change to the actual (normative) specification, but to clearly signal in 
some way that problems might occur if these issues are not observed.

>I hope the above addresses your concerns.

I regard this as ultimately your call, and I won't raise any formal 
objection if you don't agree with me, but I may continue to debate the 
matter with you to the extent that it's helpful to you.

#g



------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Wednesday, 12 May 2004 09:26:35 UTC