- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 12 May 2004 12:44:03 +0100
- To: Martin Duerst <duerst@w3.org>, public-iri@w3.org
At 18:00 12/05/04 +0900, Martin Duerst wrote: >Hello Graham, > >I have made this issue charcompareMUST-31. > > >At 12:02 04/05/10 +0100, Graham Klyne wrote: > > >>Section 5.1: >> >>[[ >>5.1 Simple String Comparison >> >> In some scenarios a definite answer to the question of IRI >> equivalence is needed that is independent of the scheme used and >> always can be calculated quickly and without accessing a network. An >> example of such a case is XML Namespaces ([XMLNamespace]). In such >> cases, two IRIs SHOULD be defined as equivalent if and only if they >> are character-by-character equivalent. This is the same as being >> byte-by-byte equivalent if the character encoding for both IRIs is >> the same. As an example, >> http://example.org/~user, http://example.org/%7euser, and >> http://example.org/%7Euser are not equivalent under this definition. >> In such a case, the comparison function MUST NOT map IRIs to URIs, >> because such a mapping would create additional spurious equivalences. >>]] >> >>It's not clear to me what the MUST NOT here is saying. Making normative >>statements that are conditional on some postulated application scenario >>seems to be a bit confusing to me. > >If you interpreted the statement as conditional on some application >scenario, then it is indeed confusing. It was intended conditional >to the comparison function. I.e. if you use character-by-character >comparison, you MUST NOT map IRIs to URIs, >because such a mapping would create additional spurious equivalences. I was taking the choice of comparison function to be part of the application scenario. >I have replaced "In such a case" with "When comparing character-by-character". I think that's better, though it doesn't quite capture my original comment. (Consider: as this is given as a normative statement, how do you propose to find interoperable implementations to demonstrate conformance when moving to Draft Standard? I still prefer my suggestion (below), but now I've raised the issue I'm happy for you to decide. >>I think the final sentence maybe should be: >>[[ >>The IRI to URI mapping function described above [ref] does not preserve >>this form of equivalence. >>]] >>(Further, the MUST NOT here seems even more perverse in light of the >>introductory material in section 3.1) > >I have checked that material again, and did not find any problems. >You may observe that that material is carefully worded in terms of >retrieval when it comes to IRI->URI mapping, not in terms of >abstract resource identification. OK, ignore that last comment. (I wasn't specifically thinking about abstract identification.) But I note that it's not obvious to me that start of section 3.1 is subject to the mention of "resource retrieval" that appears in section 3[.0]. Indeed the fact that the material in 3.1 is also said to apply to references and fragment identifiers suggests otherwise. Checking for scheme-specific syntax restrictions does not seem to be specifically related to resource retrieval. (cf. URN syntax checking.) Looking more closely at point (b) in 3.1, which clearly *is* about resource retrieval, I find myself having further qualms: [[ However, when an IRI is used for resource retrieval, the resource that the IRI locates is the same as the one located by the URI obtained after converting the IRI according to the procedure defined here. This means that there is no need to define resolution separately on the IRI level. ]] This seems to preclude the possibility of defining a resolution protocol that uses IRIs natively. Effectively, this is an imposition on any future protocol specification that can be used to resolve IRIs, which seems like a rather broad sweep. Maybe this is OK, and really is what was intended, but I feel compelled to at least mention the point. If this is what you intend, I think the point would usefully be more prominent in the text, and should be made a normative assertion; e.g. a top-level paragraph ala: [[ When an IRI is used for resource retrieval, >>it must be by means of a protocol that can also be used with URIs, and<< the resource that the IRI locates MUST be the same as the one located by the URI obtained after converting the IRI according to the procedure defined here. ]] It might be argued that the text between >> and << is redundant, to the extent that any URI is also a valid IRI. (But, thinking aloud, ... suppose I wanted to invent a new IRI scheme and protocol to serve as a kind of Chinese WordNet, with definitions retrievable in much the same way as they are for WordNet. (Notwithstanding that this may not be a good idea for other reasons.) In such a scheme, maybe there is a component which, according to the IRI scheme specification, must contain Chinese character symbol(s), so there are no URIs that are valid IRIs according to this scheme. I don't know where this leads. My main point is to try and raise a vaguely plausible scenario in which existence of a URI form for resource retrieval may be undesirable.) >>I suspect there should be some discouragement of applications depending >>on this level of equivalence, in view of the spurious distinctions that >>are lost when IRIs are converted to URIs. To my mind the string >>equivalence of the URI-converted form seems like the lowest reasonable >>level of distinction to be encouraged. > >Well, there are some serious arguments against this: >- Some very important applications, in particular XML Namespaces > and RDF, use this equivalence. So recommendation against this > would cause confusion. >- Needing to convert to URIs for every comparison is inefficient > (that was the main argument for namespaces) >- Needing to convert to URIs may lead to more URIs (rather than IRIs) > floating around, because in some cases, the conversion would > leak. >So that's why we should not go there. But these "important applications" are defined in terms of URIs, not IRIs. I'm not suggesting that one should be required to convert to URIs for every comparison, but that it might be discouraged to rely on differences between IRIs that are not present on conversion to URIs. I note that your document specifically makes reference to conversion to URIs being (notionally) used for a number of purpose, so in this respect IRIs are not something whose existence is independent of URIs, and to that extent I think to gloss over problems that might arise when conversion to URIs is performed may leave room for problems. Please note that the general thrust of my comments is not to request any change to the actual (normative) specification, but to clearly signal in some way that problems might occur if these issues are not observed. >I hope the above addresses your concerns. I regard this as ultimately your call, and I won't raise any formal objection if you don't agree with me, but I may continue to debate the matter with you to the extent that it's helpful to you. #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
Received on Wednesday, 12 May 2004 09:26:35 UTC