Date: Fri, 25 Apr 1997 19:20:07 +0200 (MET DST) From: "Martin J. Duerst" <firstname.lastname@example.org> To: "Karen R. Sollins" <email@example.com> Cc: firstname.lastname@example.org, email@example.com Subject: Re: revised "generic syntax" internet draft In-Reply-To: <199704241535.LAA10690@lysithea.lcs.mit.edu> Message-Id: <Pine.SUN.3.96.970425190044.245w-100000@enoshima> On Thu, 24 Apr 1997, Karen R. Sollins wrote: > I have tried VERY hard to stay out of this discussion, but I know have > to ask a question as suggested by the extraction above. Must one > conclude from a position of supporting encoding of character sets in > UTF-8 that the server at the site of the resource MUST be of a certain > flavor supporting that character set, and furthermore that perhaps the > general practice will be that each server will only support one or a > small number? With no general solution implemented globally, those > with less popular character sets (this often goes hand in hand with > less technology and less economic strength) are much more likely to be > left out in the cold. So much for general internationalization, > unless this means only internationalization for the larger, richer > communities. Karen, Your concerns are very understandable, but I think they are not necessary. Most current servers don't support any character sets, they just handle octets transparently. The easiest way to set up UTF-8 URLs is to use an editor or terinal emulator that understands UTF-8. Understanding of character sets and transcoding is only necessary if you want your server to accept URLs in two (or more) different encodings, for example in UTF-8 and some legacy encoding for backwards compatibility. Also, it is important to notice that full support of Unicode definitely needs some memory for fonts and lots of other things. But in any context, the main interest would be for the local script and maybe the Latin script, and so resources can be reduced dramatically. People in not so rich places also don't mind having to use a script or two if that does the job. Also, for many scripts UNicode is the main source of reference, and stands out clearly above a multitude of ad-hoc character encodings and font layouts. In some cases, small places with big surroundings also have a need to represent more than just their native script. As an example, take Georgian. They have about 20 encodings currently in use, and they would like to use Georgian, Cyrillic, and Latin in the same text, which can only be done in 8 bits with great sacrifices. If they use some native editor, writing a conversion program from<->to UTF-8 is done in a day or less. Another important aspect is the communication with the community abroad. A Georgian in the US might not want to set up some old OS with (for him) archaic tools to communicate in his mother's tongue. The main chance for him to get Georgian support in his usual software is through Unicode. Regards, Martin.