- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 25 Mar 2004 15:14:27 -0500
- To: "Williams, Stuart" <skw@hp.com>, "Williams, Stuart" <skw@hp.com>, "Williams, Stuart" <skw@hp.com>, "Ian B. Jacobs" <ij@w3.org>
- Cc: public-iri@w3.org
Hello Stuart, Many thanks for forwarding this discussion to public-iri. I have listed this as an issue at http://www.w3.org/International/iri-edit/Overview.html#IRIURIcharequiv-20. At 16:38 04/03/25 +0000, Williams, Stuart wrote: >[Shifting discussion to public-iri from tag@w3.org] > > > >So to be clear... the IRI draft section 5.1 seems to suggest that it is > > >possible for the IRI->URI mapping to generate superious > > >character-by-character equivalences (between the resulting URI) from > > >distinct IRI. > > > > > >If that can't in fact happen, then maybe the wording in 5.1 that > > >suggest that it could should be changed. > > > > It can happen. The IRI http://www.example.org/rosé is > > not character-by-character equivalent to the URI/IRI > > http://www.example.org/ros%C3%A9, but when you convert the > > IRI http://www.example.org/rosé to an URI, you get the > > URI http://www.example.org/ros%C3%A9. > >So... what other IRI, distinct from http://www.example.org/rosé >yields URI http://www.example.org/ros%C3%A9. > >Oh... yes I see, the IRI http://www.example.org/ros%C3%A9 (because in your >view URI are IRI) maps to the URI http://www.example.org/ros%C3%A9. >aaaaggghhhh. > >This gets us back to the question of whether IRI are intended 1) as a >replacement for URI as 'the' identifier for the Web, or 2) as a distinct set >of a internationalised identifiers (distinct value space, overlapping >lexical space wrt URI) which are mapped to URI for use in protocol elements >that require URI My answer here is: - both, depending on the situation - this might look bothering, but it really isn't much of an issue >- URI then remain 'the' identifier on the Web and IRI serve >as an internationalisation friendly form for inclusion in documents and UI >artifacts (from which URI are generated when necessary). Once most of what the users will see are IRIs, I'm not sure it's appropriate to call them 'UI artifacts'. Users will just see them, and probably talk about them as URLs, not even URIs, because they haven't yet learned that the 'politically correct' name is now URI. >View 2 treats URI and IRI as different types even though there lexical forms >overlap. In this view I'm not sure quite how the IRI >http://www.example.org/ros%C3%A9 would arise. Hopefully not too often. But somebody can just write it down, as it is written down in this email. You are right that it should be very rare, but from a spec point of view, the question is how to disallow it. We can't disallow percent-escaping because that would disallow http://www.example.org/ros%E9, which has to be legal for backwards compatibility purposes (and is different from the above). >I accept that simply looking at the lexical form leaves an ambiguity about >whether one is looking at a URI or an IRI. Yes, of course. And often (e.g. on a napkin or a business card), that's the only thing you have. > > So what the IRI draft says is that if you need character-by- > > character equivalence, for example for XML namespaces (e.g. > > XSLT) or for RDF to match nodes in a graph, don't convert to an URI. > >ie. you do character-by-character comparison of two IRI. Yes. > > >[Also, I'm not entirely comfortable with all "URIs are IRIs" - but I'll > > >dwell on that - this discomfort is at the level of whether the set of > > >real numbers and the set of integers are disjoint or not - ie. they are > > >different types (value-space) although the may share lexical space > > >presentation.] > > > > I'm not totally familiar with this kind of philosophical discussion. > > I tend to think about this from an operational view. Most if > > not all operations that you can do on 'integer 2' you can do > > on 'real 2', with the same results. Same the other way round. > > As far as I know, all the relevant operations you can do with > > the URI http://www.example.org/ros%C3%A9 you can do with the > > IRI http://www.example.org/ros%C3%A9, and vice versa, with > > the same result. So trying to distinguish them will only > > complicate the spec, without adding anything other than > > confusing most readers. > >But you are demonstrating a situation where for two IRI x and y: > > (not(x == y)) && (iriToUri(x) == iriToUri(y)) > >operationally this is also confusing. Looks like it may create problems. But up to now, I haven't heard about any. This situation is very similar to a situation purely on URIs. It is very easy to construct two URIs that are guaranteed to always resolve to the same thing but that don't compare character-by-character equivalent. For example http://www.example.org/ (1) and HTTP://www.example.org (2). The URI spec says that these are equivalent, but they identify two different namespaces, and they identify two different nodes in an RDF graph. So we could indeed create two different types, let's call them aURI (the ones that are only equivalent if they are character-by-character equivalent) and bURI (they are equivalent in some more cases, as far as the URI spec allows). (1) and (2) are the same bURI, but a different aURI. And you can't even know whether they are aURIs or bURIs by just looking at them. So which ones are 'the' identifier for the Web? aURIs or bURIs? So in summary: - This issue is not a problem because very similar issues also exist with URIs themselves. - Given that IRIs allow (many) more characters than URIs, but we want IRIs that are also URIs (or if you prefer: that also look like URIs) to map to themselves, it is impossible to preserve the same equivalence classes for character-by-character on IRIs and on URIs. So there does not seem to be a solution (if you know about one, please tell us). - There is already a clear warning in the draft about this, which seems to have been clear enough to catch your attention. I therefore am tentatively closing this issue with 'no action needed'. If you think that some change to the document is needed, please say so (ideally with some actual proposed text). Regards, Martin.
Received on Thursday, 25 March 2004 15:17:47 UTC