W3C home > Mailing lists > Public > www-international@w3.org > October to December 1996

Re: A character is in the eye of the beholder

From: Martin J Duerst <mduerst@ifi.unizh.ch>
Date: Thu, 24 Oct 1996 10:20:19 +0100 (MET)
To: keld@dkuug.dk (Keld J|rn Simonsen)
Cc: mtbryan@sgml.u-net.com, rosenne@NetVision.net.il, J.Larmouth@iti.salford.ac.uk, www-international@w3.org
Message-ID: <"josef.ifi..970:"@ifi.unizh.ch>
Keld Simonsen wrote:

>Martin Bryan writes:
>> This doesn't work either as some languages require accented characters to be
>> placed at the end of the list. CEN TC304 are working on a set of sorting
>> rules for ISO 10646, which i18n should adopt as soon as ready for European
>> languages, but the sorting problems of CJK will need to be met by other
>> means as the same glyph can mean different things in different
>> contexts/languages.
>Which languages require accented characters placed at the end of the list?

I guess Martin ment languages like Danish and Swedish, which put some
accented characters (which they might not call accented character)
at the end of their alphabet.

>For CJK there are a number of ways to sort 10646, and WG20 will specify 
>one. There may be more specified by national standardization bodies.
>Will this not be adequate for a number of purposes?
>SC2/WG2 will have sorting information available for all CJK characters.

Sorting ideographs as such, e.g. by some of their graphical properties,
is something that you may do if you don't have any other information.
And it's fairly easy, as the ideographs are already sorted that way
currently in ISO 10646, with two exceptions: (1) The ordering is based on
some traditional dictionaries; things that many people nowadays would
sort different are not considered. (2) With the addition of ideographic
supplement(s), the interleaving of two or more collections has to be

However, for many if not most purposes, it is customary to sort
ideographs phonetically. Because, as Martin has mentionned, pronounciation
of an ideograph depends on language and context, and the different
languages have different phonetic sorting orders, it's impossible
to say that ideograph A comes before ideograph B in all cases.
What you need e.g. for correct sorting in an index, is to
annotate the words and expressions you want to sort with phonetic
information, and to use this phonetic information for sorting.

Regards,	Martin.
Received on Thursday, 24 October 1996 04:20:44 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:16 UTC