- From: Martin J Duerst <mduerst@ifi.unizh.ch>
- Date: Thu, 24 Oct 1996 10:20:19 +0100 (MET)
- To: keld@dkuug.dk (Keld J|rn Simonsen)
- Cc: mtbryan@sgml.u-net.com, rosenne@NetVision.net.il, J.Larmouth@iti.salford.ac.uk, www-international@w3.org
Keld Simonsen wrote: >Martin Bryan writes: > >> This doesn't work either as some languages require accented characters to be >> placed at the end of the list. CEN TC304 are working on a set of sorting >> rules for ISO 10646, which i18n should adopt as soon as ready for European >> languages, but the sorting problems of CJK will need to be met by other >> means as the same glyph can mean different things in different >> contexts/languages. > >Which languages require accented characters placed at the end of the list? I guess Martin ment languages like Danish and Swedish, which put some accented characters (which they might not call accented character) at the end of their alphabet. >For CJK there are a number of ways to sort 10646, and WG20 will specify >one. There may be more specified by national standardization bodies. >Will this not be adequate for a number of purposes? >SC2/WG2 will have sorting information available for all CJK characters. Sorting ideographs as such, e.g. by some of their graphical properties, is something that you may do if you don't have any other information. And it's fairly easy, as the ideographs are already sorted that way currently in ISO 10646, with two exceptions: (1) The ordering is based on some traditional dictionaries; things that many people nowadays would sort different are not considered. (2) With the addition of ideographic supplement(s), the interleaving of two or more collections has to be defined. However, for many if not most purposes, it is customary to sort ideographs phonetically. Because, as Martin has mentionned, pronounciation of an ideograph depends on language and context, and the different languages have different phonetic sorting orders, it's impossible to say that ideograph A comes before ideograph B in all cases. What you need e.g. for correct sorting in an index, is to annotate the words and expressions you want to sort with phonetic information, and to use this phonetic information for sorting. Regards, Martin.
Received on Thursday, 24 October 1996 04:20:44 UTC