- From: Leslie Daigle <leslie@Bunyip.Com>
- Date: Tue, 1 Sep 1998 10:15:23 -0400 (EDT)
- To: Larry Masinter <masinter@parc.xerox.com>
- cc: URI distribution list <uri@Bunyip.Com>
Howdy, On Mon, 31 Aug 1998, Larry Masinter wrote: > I submitted a revision to Internet-Drafts, but there've been several > revisions since; I suggest you fetch it from: This version seems considerably "tighter" than the version I previously commented on, but I still have troubles with the interpretation section: 3.5 Interpretation of URIs Software that interprets URIs as the names of local resources SHOULD accept multiple renditions of the URIs in the case where those resources names might have non-ASCII representations; this includes accepting both the URI syntax of section 2.1 and the 8URI form in section 2.2. Just as allowing case-insensitive file names makes URIs more robust (because the person viewing the URI might type the case differently than it is displayed), similarly, URI-interpreting software should be generous in allowing all of the possible representations that might result from the recommendations in section 3.1. In addition, it is useful if unaccented characters are accepted, when possible, as aliases for accented characters, and that other equivalences are made. This is so fuzzy as to effectively randomize possible outcomes of trying to resolve a URI. Without further guidance, clients cannot know what possible set of equivalences (or distinctions) a given server might apply, and servers cannot know what possible set of equivalences (or distinctions) a client might expect. In particular, it isn't clear to me what "it is useful if unaccented characters are accepted, when possible, as aliases for accented characters". Consider, in French, "é" is "e with an acute accent" in Swedish, "ö" is a completely different letter than "o", to the extent that it appears in a completely different place in alphabetic ordering. While this deals primarily in issues of (de)composition, it also means that a French person/client/server software might more readily expect a match between the accented/unaccented "e", whereas a Swedish person/client/server would not conceive of such a thing. If there is not a well-known, algorithmically-applicable set of rules to achieve this set of "multiple renditions", this should not be suggested as a "SHOULD". If there _is_ a well-known, algorithmically-applicable set of rules to achieve this set of "multiple renditions", those rules should be cited right here. Leslie. ---------------------------------------------------------------------------- If cats had bumper stickers: Leslie Daigle "I wake for food." Bunyip Information Systems -- ThinkingCat (514) 875-8611 leslie@bunyip.com ----------------------------------------------------------------------------
Received on Tuesday, 1 September 1998 10:44:01 UTC