RE: iDNR, an alternative name resolution protocol from Martin J. Duerst on 1998-09-02 (uri@w3.org from September 1998)

From: Martin J. Duerst <duerst@w3.org>
Date: Wed, 02 Sep 1998 12:54:04 +0900
To: Leslie Daigle <leslie@Bunyip.Com>
Cc: Larry Masinter <masinter@parc.xerox.com>, URI distribution list <uri@Bunyip.Com>
Message-Id: <199809020838.RAA23675@sh.w3.mag.keio.ac.jp>

Hello Leslie,

At 10:15 98/09/01 -0400, Leslie Daigle wrote:

> In particular, it isn't clear to me what "it is useful if unaccented
> characters are accepted, when possible, as aliases for accented
> characters".    Consider,
> 
>         in French, "� is "e with an acute accent"
>         in Swedish, "�� is a completely different letter than "o", to
>           the extent that it appears in a completely different place
>           in alphabetic ordering.

Alain has given some very good explanations here. The answer is:
It depends. Actually, the answer is already "it depends" for
the current URIs, with respect to case. I don't think it's realistic
to expect us to improve on what hasn't been done better up to now
in the very limited ASCII range.

That said, I am of course very concerned to get things as well working
as possible. URIs are not the only place we get into such problems.
In W3C, several working groups have made requests for guidance in this
area to the W3C I18N WG, and this WG has already published a working
draft for requirements for some of the things that you mention above
(and some others). Please have a look at http://www.w3.org/TR/WD-charreq,
comments are very wellcome.

Getting back to URIs specifically, I see at least three levels that
we have to address:

- A minimum that should be achieved by normalization at the origin;
  this is mainly to eliminate pure encoding duplicates such as they
  appear with precomposed/decomposed. At W3C, we are coordinating
  this work with Unicode; they have already issued a draft on this
  issue (http://www.unicode.org/unicode/reports/tr15/), on which
  also comments are welcome.

- Some larger equivalences that may be offered as "quality of service"
  (e.g. for the directory/file component and case-insensitivity for
  many HTTP servers) or may be part of the protocol/scheme/scheme
  component,... (e.g. case folding for domain names).

- An even larger class of equivalences that would be used e.g. for
  tools that check for spoofing attempts. This may include things
  such as wrongly interpreted encodings (e.g. something that is
  actually Latin-1 instead of UTF-8,...) and almost everything that
  didn't go into the last item for a particular case.

Regards,   Martin.

Received on Wednesday, 2 September 1998 04:43:59 UTC