- From: Martin Bryan <mtbryan@sgml.u-net.com>
- Date: Thu, 24 Oct 1996 10:40:41 +0100
- To: keld@dkuug.dk (Keld J|rn Simonsen), Jonathan Rosenne <rosenne@netvision.net.il>, J.Larmouth@iti.salford.ac.uk, www-international@w3.org
Keld >> >> This doesn't work either as some languages require accented characters to be >> placed at the end of the list. CEN TC304 are working on a set of sorting >> rules for ISO 10646, which i18n should adopt as soon as ready for European >> languages, but the sorting problems of CJK will need to be met by other >> means as the same glyph can mean different things in different >> contexts/languages. > >I am not sure why it does not work to follow the international >standards in this area. I am talking also of SC22/WG20 who is working >on sorting on the whole of 10646. I gave you a reference earlier. > >I would like some more information on the problems you mention: > When SC18/WG8 looked into sorting for ISO/IEC 10179 we came across a number of problems that prevented us from adopting a common algorithm: the POSIX approach ended up being the best we could suggest. It seems that there is a difference in sorting order for things like O umlaut that is language dependent. Some languages sort words starting with this character as if they were similar to O and others insist on them being placed after Z. Different rules can apply when the character occurs within the word from when it starts a word. Similarly the sorting order for a CJK glyph depends on which language you are using the glyph with. It will appear in one order in a Japanese dictionary and another in a Taiwanese one. If SC22/WG20 can come up with an ordering that can be accepted by all dictionary producers as an internationally agreed standard I can assure you that SC18/WG8 will be only too glad to adopt it, but at present our community, the publishing world, cannot agree on a standardized ordering of accented characters >Which languages require accented characters placed at the end of the list? I cannot remember whether its Swedish or Norwegian, but I know that the two do not agree on the rules for dictionary ordering. > >For CJK there are a number of ways to sort 10646, and WG20 will specify >one. There may be more specified by national standardization bodies. >Will this not be adequate for a number of purposes? Yes, for a number of them, but not for all. What is needed is a methodology for users to identify any alterations they require to a default/starter set. This is what we sought to provide in 10179. >SC2/WG2 will have sorting information available for all CJK characters. When they can get agreement from their user communities! ---- Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK Phone/Fax: +44 1452 714029 WWW home page: http://www.u-net.com/~sgml/
Received on Thursday, 24 October 1996 05:43:40 UTC