- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 28 Mar 2004 11:04:59 -0500
- To: "Williams, Stuart" <skw@hp.com>
- Cc: public-iri@w3.org
At 23:29 04/03/25 +0000, Williams, Stuart wrote: >Hello Martin, > > > I therefore am tentatively closing this issue with 'no action needed'. > > If you think that some change to the document is needed, > > please say so (ideally with some actual proposed text). > >The short answer is yes... I agree. Okay, great, thanks! I'll move the issue to closed, then. >I chewed on this of quite a time. Even came with and idea to address your >challenge (which I'll mention below) but I think it amounts the same thing >as the MUST NOT in section 5.1: > > As an example, > http://example.org/~user, http://example.org/%7euser, and > http://example.org/%7Euser are not equivalent under this definition. > In such a case, the comparison function MUST NOT map IRIs to URIs, > because such a mapping would create additional spurious equivalences. > >It was the "spurious equivalences" phrase that triggered my concern. If you have any idea on how to phrase this in a better way, please tell me. I agree that "spurious equivalences" can be a bit misleading. >My idea to address your challenge, which I'm not overly committed to... and >I think amounts the the same as the MUST NOT above is this: > >Regard the lexical spaces of URI and IRI as disjoint ie. if the identifier >string is valid under the URI grammar... it's a URI and *not* an IRI. IRI >are then the remaining strings that match the IRI grammar. Effectively this >forces IRI to be only those identifier strings that actually contain >characters that are not 'legal' in URI. Character by character IRI/IRI and >URI/URI comparision work as normal. Character-by-character IRI/URI >comparison always yield not-equal (because URI and IRI are disjoint). This alone wouldn't do it. Just take a more complicated example: http://www.example.org/rosé/ros%C3%A9 and http://www.example.org/rosé/rosé Both of them are IRIs in your definition (because the contain at least one é non-ascii character), and both of them map to the URI http://www.example.org/ros%C3%A9/ros%C3%A9 There are probably ways to wiggle out of that problem, but that would mean changing the IRI->URI function, making it much more complicated, for a doubtful gain in practice. Regards, Martin.
Received on Sunday, 28 March 2004 11:05:20 UTC