Re: "Difficult Characters" draft

Larry Masinter (masinter@parc.xerox.com)
Tue, 6 May 1997 08:49:22 PDT


Message-ID: <336F5302.64F7@parc.xerox.com>
Date: Tue, 6 May 1997 08:49:22 PDT
From: Larry Masinter <masinter@parc.xerox.com>
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
CC: "Alain LaBont/e'/" <alb@riq.qc.ca>, URI mailing list <uri@bunyip.com>
Subject: Re: "Difficult Characters" draft

Martin,

Perhaps you could mention in your draft about the use of
identifiers with characters outside of ASCII that such
use is actually problematic, and that some applications
which use canonical identifiers and exact match as a way
of doing symbol lookup when restricted to ASCII-only symbols
might find that users of languages other than English
will be ill-served by such a design; in some applications
using a careful language-sensitive equivalence lookup
(instead of exact-match) would make the software actually
accomodate the needs and practices of such users.

The mail in the recent week has been full of good examples
of places where canonicalization is either ill-specified
or context-sensitive, and "equivalence matching"
would be far more practical.

Fortunately, it's possible that equivalence-based matching
could be deployed for URLs; other kinds of exact-match
names will require a separate analysis. Both DNS and HTTP-servers
(if not FTP servers) could be coaxed into doing equivalence-matching
instead of exact matching for reference lookup; if they also respond
with the server's view of the "canonical" name, then we
won't be asking clients to do what it seems like is nearly
impossible.

Larry
--
http://www.parc.xerox.com/masinter