RE: Character-by-Character Distinct IRI-> Char-by-char equivale nt URI (issue #IRIURIcharequiv-20) from Martin Duerst on 2004-03-28 (public-iri@w3.org from March 2004)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 28 Mar 2004 11:04:59 -0500
To: "Williams, Stuart" <skw@hp.com>
Cc: public-iri@w3.org
Message-Id: <4.2.0.58.J.20040328105823.05b107c8@localhost>

At 23:29 04/03/25 +0000, Williams, Stuart wrote:

>Hello Martin,
>
> > I therefore am tentatively closing this issue with 'no action needed'.
> > If you think that some change to the document is needed,
> > please say so (ideally with some actual proposed text).
>
>The short answer is yes... I agree.

Okay, great, thanks! I'll move the issue to closed, then.


>I chewed on this of quite a time. Even came with and idea to address your
>challenge (which I'll mention below) but I think it amounts the same thing
>as the MUST NOT in section 5.1:
>
>    As an example,
>    http://example.org/~user, http://example.org/%7euser, and
>    http://example.org/%7Euser are not equivalent under this definition.
>    In such a case, the comparison function MUST NOT map IRIs to URIs,
>    because such a mapping would create additional spurious equivalences.
>
>It was the "spurious equivalences" phrase that triggered my concern.

If you have any idea on how to phrase this in a better way,
please tell me. I agree that "spurious equivalences" can be
a bit misleading.


>My idea to address your challenge, which I'm not overly committed to... and
>I think amounts the the same as the MUST NOT above is this:
>
>Regard the lexical spaces of URI and IRI as disjoint ie. if the identifier
>string is valid under the URI grammar... it's a URI and *not* an IRI. IRI
>are then the remaining strings that match the IRI grammar. Effectively this
>forces IRI to be only those identifier strings that actually contain
>characters that are not 'legal' in URI. Character by character IRI/IRI and
>URI/URI comparision work as normal. Character-by-character IRI/URI
>comparison always yield not-equal (because URI and IRI are disjoint).

This alone wouldn't do it. Just take a more complicated example:

http://www.example.org/ros&eacute;/ros%C3%A9 and
http://www.example.org/ros&eacute;/ros&eacute;

Both of them are IRIs in your definition (because the contain at
least one &eacute; non-ascii character), and both of them map to
the URI

http://www.example.org/ros%C3%A9/ros%C3%A9

There are probably ways to wiggle out of that problem, but that
would mean changing the IRI->URI function, making it much more
complicated, for a doubtful gain in practice.


Regards,     Martin.

Received on Sunday, 28 March 2004 11:05:20 UTC