Re: A character is in the eye of the beholder

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Thu, 24 Oct 1996 10:40:41 +0100
To: keld@dkuug.dk (Keld J|rn Simonsen), Jonathan Rosenne <rosenne@netvision.net.il>, J.Larmouth@iti.salford.ac.uk, www-international@w3.org
>> This doesn't work either as some languages require accented characters to be
>> placed at the end of the list. CEN TC304 are working on a set of sorting
>> rules for ISO 10646, which i18n should adopt as soon as ready for European
>> languages, but the sorting problems of CJK will need to be met by other
>> means as the same glyph can mean different things in different
>> contexts/languages.
>I am not sure why it does not work to follow the international
>standards in this area. I am talking also of SC22/WG20 who is working
>on sorting on the whole of 10646. I gave you a reference earlier.
>I would like some more information on the problems you mention:
When SC18/WG8 looked into sorting for ISO/IEC 10179 we came across a number
of problems that prevented us from adopting a common algorithm: the POSIX
approach ended up being the best we could suggest. It seems that there is a
difference in sorting order for things like O umlaut that is language
dependent. Some languages sort words starting with this character as if they
were similar to O and others insist on them being placed after Z. Different
rules can apply when the character occurs within the word from when it
starts a word. Similarly the sorting order for a CJK glyph depends on which
language you are using the glyph with. It will appear in one order in a
Japanese dictionary and another in a Taiwanese one.

If SC22/WG20 can come up with an ordering that can be accepted by all
dictionary producers as an internationally agreed standard I can assure you
that SC18/WG8 will be only too glad to adopt it, but at present our
community, the publishing world, cannot agree on a standardized ordering of
accented characters

>Which languages require accented characters placed at the end of the list?

I cannot remember whether its Swedish or Norwegian, but I know that the two
do not agree on the rules for dictionary ordering.
>For CJK there are a number of ways to sort 10646, and WG20 will specify 
>one. There may be more specified by national standardization bodies.
>Will this not be adequate for a number of purposes?

Yes, for a number of them, but not for all. What is needed is a methodology
for users to identify any alterations they require to a default/starter set.
This is what we sought to provide in 10179.

>SC2/WG2 will have sorting information available for all CJK characters.

When they can get agreement from their user communities!

