Re: "Difficult Characters" draft

Alain LaBont/e'/ (alb@sct.gouv.qc.ca)
Mon, 21 Apr 1997 16:10:22


Message-Id: <3.0.1.16.19970421161022.08f7bb0c@riq.qc.ca>
Date: Mon, 21 Apr 1997 16:10:22
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
From: "Alain LaBont/e'/" <alb@sct.gouv.qc.ca>
Subject: Re: "Difficult Characters" draft
Cc: URI mailing list <uri@bunyip.com>
In-Reply-To: <Pine.SUN.3.96.970506210330.245U-100000@enoshima>

A 21:20 97-05-06 +0200, Martin J. Duerst a écrit :
>On Tue, 22 Apr 1997, Alain LaBont/e'/ wrote:
>
>> From a *real user*'s point of view what you say is disconcerting. In fact
>> it does not correspond to a reality I exeperience every day. My insurance
>> agent gave me his personal URL last week, for example, URL in which there
>> were uppercase letters that were transformed into lower case when Netscape
>> displayed the actual URL and in searching with both forms it is allright...
>
>Well, I just tried the URL, and my Netscape didn't do any lowercasing.

Someone does in my environment (I'm not sure it is Netscape)... but I use
French versions... of Netscape 2 under Win 3.1 and Netscape 3 Gold under
Win95...

>But that's a detail.
>
>> Hence in this actual concrete example,
>> 
>> http://www.LaMutuelle.com/agent/home.htm?aid=S200569 and
>> http://www.lamutuelle.com/agent/home.htm?aid=S200569
>> 
>> are totally equivalent. Changing those habits would not be desirable.
>
>These are indeed totally equivalent. But try to write
>
>> http://www.LaMutuelle.com/Agent/home.htm?aid=S200569 or
>> http://www.lamutuelle.com/agent/Home.htm?aid=S200569
>
>and you will get a nasty error (all in English, with a pointer
>to http://www.themutualgroup.com/). Some exceptions and surprises to the
>contrary nonewithstanding, an uninformed user has to be tought to copy an
>URL as is, including case. A more informed user may know about parts
>of an URL that can be changed in capitalization. Actually, you can
>write
>
>> http://wWw.lAmUtUeLlE.CoM/agent/home.htm?aid=S200569
>
>and it will still work. But please leave the part after the first
>single slash alone.

All right... That is not very user friendly. Totally inconsistent... from a
user perspective, undesirable...

>> In French at least, case doesn't have in general the importance that has in
>> German, for example. For accented and unaccented data, of course minimally
>> a lower case accented letter should be equivalent to the upper case
>> counterpart, but even in lower case, it is desirable that an unaccented
>> letter be equivalent to its accented counterpart (an actual case is that it
>> is processed like this since 1981 in DOS on a PC) for searching purposes.
>
>If a lowercase accented letter appears in the later part of an URL,
>it won't be equivalent to the corresponding uppercase letter because
>there is also no equivalence for nonaccented letters.

If I understood well, no equivalences at all even for case. But what about
the first part? What about user expectations in inconsistent behaviours?

>In case there is indeed equivalence, as we currently have it in domain
>names, it will be the task of domain name internationalization to
>decide what to do about it, whether to make the usual domain names
>case sensitive or whether to introduce case eqivalences for characters
>outside ASCII or whatever. There is no problem with any kind of
>URL scheme or mechanism to introduce additional eqivalences where
>they see fit, but we can't introduce them for all URLs.

I'm puzzled that the notion of consistency is neglected... I learned
something.

>> What I suggest is that searching be done according to the same spirit as
>> ISO/IEC CD 14651 which deals with such equivalences. At the limit (this
>> does not have an influence on URLs but it should be considered) in
>> searching URLs, expectations could be built on LOCALEs... that is what I
>> suggest.
>
>I full agree for searching. However, what is done usually with URLs
>is not searching. It is binary matching. Only things that are absolutely
>binary equivalent (after the last step in your sorting standard) match.
>The normalization procedures in the draft only increase the level a tiny
>bit, to avoid those cases where the binary representation is different,
>but the user has absolutely no chance to make a difference.
>
>
>> For example as was explained, o and ö are not equivalent in Swedish (while
>> they are in German),
>
>They are definitely not! Otherwise, we wouldn't need the ö :-).
>It's only that we don't consider ö a letter of its own,
>but that doesn't mean a German wouldn't be able to know where
>to put an o and where to put an ö in an URL (with the exception
>of those cases where both possibilities make sense and where it
>is all the more important to make the difference :-).
>
>
>> n and ñ are not equivalent in Spanish while they are
>> in French and so on. That has no impact per se on the making of URLs, but
>> it has one on their use, that was the only consideration I was trying to
>> suggest.
>
>I agree that it should have an inpact on the use in searching and such.
>But that's not the main function of URLs.

Not the main, but if it is a function it becomes problematic. I do not want
to be a trouble maker, but just signal problems from a user point of view.

Alain LaBonté
Québec