- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Thu, 24 Oct 1996 19:05:36 +0200
- To: Martin Bryan <mtbryan@sgml.u-net.com>, Jonathan Rosenne <rosenne@netvision.net.il>, J.Larmouth@iti.salford.ac.uk, www-international@w3.org
Martin Bryan writes: > When SC18/WG8 looked into sorting for ISO/IEC 10179 we came across a number > of problems that prevented us from adopting a common algorithm: the POSIX > approach ended up being the best we could suggest. It seems that there is a > difference in sorting order for things like O umlaut that is language > dependent. Some languages sort words starting with this character as if they > were similar to O and others insist on them being placed after Z. Different > rules can apply when the character occurs within the word from when it > starts a word. Similarly the sorting order for a CJK glyph depends on which > language you are using the glyph with. It will appear in one order in a > Japanese dictionary and another in a Taiwanese one. Yes, sorting is dependent on culture. That is why CEN in ENV 12005 and hopefully also ISO will have a registry of cultural conventions, including sorting. The ISO POSIX working group has collected a number of sorting specification in POSIX syntax in the directory ftp://dkuug.dk/i18n/WG15-collection/locales/ > If SC22/WG20 can come up with an ordering that can be accepted by all > dictionary producers as an internationally agreed standard I can assure you > that SC18/WG8 will be only too glad to adopt it, but at present our > community, the publishing world, cannot agree on a standardized ordering of > accented characters Yes, it is agreed that this is dependent on culture. > >For CJK there are a number of ways to sort 10646, and WG20 will specify > >one. There may be more specified by national standardization bodies. > >Will this not be adequate for a number of purposes? > > Yes, for a number of them, but not for all. What is needed is a methodology > for users to identify any alterations they require to a default/starter set. > This is what we sought to provide in 10179. This is then what we are defining in 14652: a number of ways to alter a default specification. > >SC2/WG2 will have sorting information available for all CJK characters. > > When they can get agreement from their user communities! I think it is already there for the addition of CJK characters currently being proposed in SC2/WG2 - I have not got around to looking at the data tho. Keld
Received on Thursday, 24 October 1996 13:07:35 UTC