Re: "Difficult Characters" draft

Alain LaBont/e'/ (alb@sct.gouv.qc.ca)
Tue, 22 Apr 1997 14:17:30


Message-Id: <3.0.1.16.19970422141730.2b8f756c@riq.qc.ca>
Date: Tue, 22 Apr 1997 14:17:30
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
From: "Alain LaBont/e'/" <alb@sct.gouv.qc.ca>
Subject: Re: "Difficult Characters" draft
Cc: URI mailing list <uri@bunyip.com>
In-Reply-To: <Pine.SUN.3.96.970506111326.245L-100000@enoshima>

A 11:56 97-05-06 +0200, Martin J. Duerst a écrit :
[Alain] :
>> I may be wrong, but it might be also that bad habits formed
>> expectations about unprecise searches. Do you mean that here we mean really
>> precice seraches in which even case shall be used as is?

[Martin] :
>Definitely. That's what happens today with URLs. The intent of
>the document is not to define equivalences for search, but to
>define normalization at the source so that we can use the binary
>comparison of existing software.

[Alain] :
>> That would really
>> be  misleading for French-speaking users (I talk by experience, having done
>> such tests by accident in an international audience).
>
>ASCII web users have learned that they have to take care about case
>in URLs. ASCII URL creators have learned that they, too, have to take
>care about case in URLs, in order to make it easy for the users.
>Beyond-ASCII users and URL creators will have to learn similar things
>with respect to case and with respect to other stuff, such as accents.
>
>French URL users may have to learn that on uppercase URLs, they should
>not drop accents that they see. French URL creators may have to learn
>that they better not create uppercase accented characters in their
>URLs in order to not disturb their users. One of these things, or
>both, may end up in the current draft. What would you suggest?

>From a *real user*'s point of view what you say is disconcerting. In fact
it does not correspond to a reality I exeperience every day. My insurance
agent gave me his personal URL last week, for example, URL in which there
were uppercase letters that were transformed into lower case when Netscape
displayed the actual URL and in searching with both forms it is allright...

Hence in this actual concrete example,

http://www.LaMutuelle.com/agent/home.htm?aid=S200569 and
http://www.lamutuelle.com/agent/home.htm?aid=S200569

are totally equivalent. Changing those habits would not be desirable.

In French at least, case doesn't have in general the importance that has in
German, for example. For accented and unaccented data, of course minimally
a lower case accented letter should be equivalent to the upper case
counterpart, but even in lower case, it is desirable that an unaccented
letter be equivalent to its accented counterpart (an actual case is that it
is processed like this since 1981 in DOS on a PC) for searching purposes.

What I suggest is that searching be done according to the same spirit as
ISO/IEC CD 14651 which deals with such equivalences. At the limit (this
does not have an influence on URLs but it should be considered) in
searching URLs, expectations could be built on LOCALEs... that is what I
suggest.

For example as was explained, o and ö are not equivalent in Swedish (while
they are in German), n and ñ are not equivalent in Spanish while they are
in French and so on. That has no impact per se on the making of URLs, but
it has one on their use, that was the only consideration I was trying to
suggest.

Alain LaBonté
Québec