RE: iDNR, an alternative name resolution protocol from Leslie Daigle on 1998-09-01 (uri@w3.org from September 1998)

From: Leslie Daigle <leslie@Bunyip.Com>
Date: Tue, 1 Sep 1998 10:15:23 -0400 (EDT)
To: Larry Masinter <masinter@parc.xerox.com>
cc: URI distribution list <uri@Bunyip.Com>
Message-ID: <Pine.SUN.3.95.980901100647.12597B-100000@mocha.bunyip.com>

Howdy,

On Mon, 31 Aug 1998, Larry Masinter wrote:
> I submitted a revision to Internet-Drafts, but there've been several
> revisions since; I suggest you fetch it from:

This version seems considerably "tighter" than the version I previously
commented on, but I still have troubles with the interpretation section:

   3.5 Interpretation of URIs

   Software that interprets URIs as the names of local resources SHOULD
   accept multiple renditions of the URIs in the case where those
   resources names might have non-ASCII representations; this includes
   accepting both the URI syntax of section 2.1 and the 8URI form in
   section 2.2.

   Just as allowing case-insensitive file names makes URIs more robust
   (because the person viewing the URI might type the case differently
   than it is displayed), similarly, URI-interpreting software should be
   generous in allowing all of the possible representations that might
   result from the recommendations in section 3.1. In addition, it is
   useful if unaccented characters are accepted, when possible, as
   aliases for accented characters, and that other equivalences are made.

This is so fuzzy as to effectively randomize possible outcomes of
trying to resolve a URI.  Without further guidance, clients cannot
know what possible set of equivalences (or distinctions) a given server might 
apply, and servers cannot know what possible set of equivalences (or 
distinctions) a client might expect.

In particular, it isn't clear to me what "it is useful if unaccented
characters are accepted, when possible, as aliases for accented
characters".    Consider,

        in French, "�" is "e with an acute accent"
        in Swedish, "�" is a completely different letter than "o", to
          the extent that it appears in a completely different place
          in alphabetic ordering.

While this deals primarily in issues of (de)composition, it also
means that a French person/client/server software might more readily expect
a match between the accented/unaccented "e", whereas a Swedish 
person/client/server would not conceive of such a thing.

If there is not a well-known, algorithmically-applicable set of
rules to achieve this set of "multiple renditions", this should not
be suggested as a "SHOULD".

If there _is_ a well-known, algorithmically-applicable set of rules
to achieve this set of "multiple renditions", those rules should be cited
right here.

Leslie.

----------------------------------------------------------------------------

    If cats had bumper stickers:                  Leslie Daigle

      "I wake for food."                          Bunyip Information Systems
                -- ThinkingCat                    (514) 875-8611
                                                  leslie@bunyip.com
----------------------------------------------------------------------------

Received on Tuesday, 1 September 1998 10:44:01 UTC