- From: Leslie Daigle <leslie@Bunyip.Com>
- Date: Tue, 1 Sep 1998 10:15:23 -0400 (EDT)
- To: Larry Masinter <masinter@parc.xerox.com>
- cc: URI distribution list <uri@Bunyip.Com>
Howdy,
On Mon, 31 Aug 1998, Larry Masinter wrote:
> I submitted a revision to Internet-Drafts, but there've been several
> revisions since; I suggest you fetch it from:
This version seems considerably "tighter" than the version I previously
commented on, but I still have troubles with the interpretation section:
3.5 Interpretation of URIs
Software that interprets URIs as the names of local resources SHOULD
accept multiple renditions of the URIs in the case where those
resources names might have non-ASCII representations; this includes
accepting both the URI syntax of section 2.1 and the 8URI form in
section 2.2.
Just as allowing case-insensitive file names makes URIs more robust
(because the person viewing the URI might type the case differently
than it is displayed), similarly, URI-interpreting software should be
generous in allowing all of the possible representations that might
result from the recommendations in section 3.1. In addition, it is
useful if unaccented characters are accepted, when possible, as
aliases for accented characters, and that other equivalences are made.
This is so fuzzy as to effectively randomize possible outcomes of
trying to resolve a URI. Without further guidance, clients cannot
know what possible set of equivalences (or distinctions) a given server might
apply, and servers cannot know what possible set of equivalences (or
distinctions) a client might expect.
In particular, it isn't clear to me what "it is useful if unaccented
characters are accepted, when possible, as aliases for accented
characters". Consider,
in French, "é" is "e with an acute accent"
in Swedish, "ö" is a completely different letter than "o", to
the extent that it appears in a completely different place
in alphabetic ordering.
While this deals primarily in issues of (de)composition, it also
means that a French person/client/server software might more readily expect
a match between the accented/unaccented "e", whereas a Swedish
person/client/server would not conceive of such a thing.
If there is not a well-known, algorithmically-applicable set of
rules to achieve this set of "multiple renditions", this should not
be suggested as a "SHOULD".
If there _is_ a well-known, algorithmically-applicable set of rules
to achieve this set of "multiple renditions", those rules should be cited
right here.
Leslie.
----------------------------------------------------------------------------
If cats had bumper stickers: Leslie Daigle
"I wake for food." Bunyip Information Systems
-- ThinkingCat (514) 875-8611
leslie@bunyip.com
----------------------------------------------------------------------------
Received on Tuesday, 1 September 1998 10:44:01 UTC