Date: Mon, 5 May 1997 21:09:50 +0200 (MET DST) From: "Martin J. Duerst" <email@example.com> To: "Alain LaBont/e'/" <firstname.lastname@example.org> cc: Leslie Daigle <email@example.com>, URI mailing list <firstname.lastname@example.org> Subject: Re: "Difficult Characters" draft In-Reply-To: <email@example.com> Message-ID: <Pine.SUN.3.96.970505205750.245G-100000@enoshima> On Mon, 21 Apr 1997, Alain LaBont/e'/ wrote: > A 12:40 97-05-05 -0400, Leslie Daigle a écrit : > > > >For example, "o" and "ö" are unrelated characters in Swedish, so it > >would be erroneous to say that they are equivalent in an accent-insensitive > >search. Lexicographically, "ö" is the last character in the alphabet > >in Swedish. > > > >So, "accent-insensitive" matching is pretty well language-dependent. > > [Alain] : > Of course! Same for ñ which is simply an accented n in French cañon and a > letter on its own in Spanish cañon... In other words, in Spanish, searching > on "canon" shall never retrieve "cañon"; in French it could, for unprecise > searches, as well as the word "canon"... - What is retrieved and what not for unprecise searches may depend on many things. It is well possible that "canon" can retrieve "cañon" in a Spanish spelling checker, it is only a one- letter subsitituion. - We are dealing with identifiers, and assuming precise matching up to the precision a human reader familiar with the script is able to handle. In this respect, discussions about unprecise searches are irrelevant. Regards, Martin.