Re: "Difficult Characters" draft
Martin J. Duerst (mduerst@ifi.unizh.ch)
Mon, 5 May 1997 21:09:50 +0200 (MET DST)
Date: Mon, 5 May 1997 21:09:50 +0200 (MET DST)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: "Alain LaBont/e'/" <alb@sct.gouv.qc.ca>
cc: Leslie Daigle <leslie@bunyip.com>, URI mailing list <uri@bunyip.com>
Subject: Re: "Difficult Characters" draft
In-Reply-To: <3.0.1.16.19970421134857.29b7b16a@riq.qc.ca>
Message-ID: <Pine.SUN.3.96.970505205750.245G-100000@enoshima>
On Mon, 21 Apr 1997, Alain LaBont/e'/ wrote:
> A 12:40 97-05-05 -0400, Leslie Daigle a écrit :
> >
> >For example, "o" and "ö" are unrelated characters in Swedish, so it
> >would be erroneous to say that they are equivalent in an accent-insensitive
> >search. Lexicographically, "ö" is the last character in the alphabet
> >in Swedish.
> >
> >So, "accent-insensitive" matching is pretty well language-dependent.
>
> [Alain] :
> Of course! Same for ñ which is simply an accented n in French cañon and a
> letter on its own in Spanish cañon... In other words, in Spanish, searching
> on "canon" shall never retrieve "cañon"; in French it could, for unprecise
> searches, as well as the word "canon"...
- What is retrieved and what not for unprecise searches may depend
on many things. It is well possible that "canon" can retrieve
"cañon" in a Spanish spelling checker, it is only a one-
letter subsitituion.
- We are dealing with identifiers, and assuming precise matching up
to the precision a human reader familiar with the script
is able to handle. In this respect, discussions about
unprecise searches are irrelevant.
Regards, Martin.