Re: 8 bit characters in DNS names (and URNs?)

Peter Paul Sint (sint@oeaw.ac.at)
Sat, 9 Mar 1996 02:34:10 +0100


Message-Id: <v02130505ad668ac95a8d@[193.170.88.66]>
Date: Sat, 9 Mar 1996 02:34:10 +0100
To: masinter@parc.xerox.com (Larry Masinter)
From: sint@oeaw.ac.at (Peter Paul Sint)
Subject: Re: 8 bit characters in DNS names (and URNs?)
Cc: keld@dkuug.dk, martin@terena.nl, wg-i18n@terena.nl, uri@bunyip.com

At 9:44 08.03.1996, Masataka Ohta wrote:
>> JIS might
>> have separate codes for single and double-wide codes yet want to treat
>> them equivalent for matching.
>JIS does not.
>> While uppercase mapping is culturally sensitive, can we not make a
>> culturally independent 'character matching' algorithm that is good
>> enough for directory services.
>
>Theoretically, it is a union of all the matching rules of all
>the culture. But, in practice, it is hard especially because
>the expected degree of matching differs service by service.
>                                               Masataka Ohta

German has a lower case letter
(looks like a beta -  /tell your software to read next line latin-1 quoted
printable/
=DF
Swiss German doesn't use it).
Equivalent to ss, capital SS (*two* letters).
Also, the canonical conversion of the
umlauts (vowel + two dots above)
=E4   is ae
=F6   is oe
=FC   is ue
capitalised AE OE UE
(historically the two dots were originally an e written above).

You would never write umlaut A as an A. (only aliens do so - and software).

The back transformation is not unique!

German matching software handles this (as far as possible).





Peter Paul Sint    (sint@oeaw.ac.at, http://www.soe.oeaw.ac.at/~sint/)
Research Unit for Socio-Economics, Austrian Academy of Sciences
Kegelgasse 27, A-1030 Wien (=3DVienna), Austria.
Phone:(+431) 712 21 40 - 36   Fax: (+431) 712 21 40 - 34