[Prev][Next][Index][Thread]

Re: A character is in the eye of the beholder



Martin Bryan writes:

> When SC18/WG8 looked into sorting for ISO/IEC 10179 we came across a number
> of problems that prevented us from adopting a common algorithm: the POSIX
> approach ended up being the best we could suggest. It seems that there is a
> difference in sorting order for things like O umlaut that is language
> dependent. Some languages sort words starting with this character as if they
> were similar to O and others insist on them being placed after Z. Different
> rules can apply when the character occurs within the word from when it
> starts a word. Similarly the sorting order for a CJK glyph depends on which
> language you are using the glyph with. It will appear in one order in a
> Japanese dictionary and another in a Taiwanese one.

Yes, sorting is dependent on culture. That is why CEN in ENV 12005 and
hopefully also ISO will have a registry of cultural conventions,
including sorting. The ISO POSIX working group has collected a number
of sorting specification in POSIX syntax in the directory
ftp://dkuug.dk/i18n/WG15-collection/locales/

> If SC22/WG20 can come up with an ordering that can be accepted by all
> dictionary producers as an internationally agreed standard I can assure you
> that SC18/WG8 will be only too glad to adopt it, but at present our
> community, the publishing world, cannot agree on a standardized ordering of
> accented characters

Yes, it is agreed that this is dependent on culture.

> >For CJK there are a number of ways to sort 10646, and WG20 will specify 
> >one. There may be more specified by national standardization bodies.
> >Will this not be adequate for a number of purposes?
> 
> Yes, for a number of them, but not for all. What is needed is a methodology
> for users to identify any alterations they require to a default/starter set.
> This is what we sought to provide in 10179.

This is then what we are defining in 14652: a number of ways
to alter a default specification.

> >SC2/WG2 will have sorting information available for all CJK characters.
> 
> When they can get agreement from their user communities!

I think it is already there for the addition of CJK characters
currently being proposed in SC2/WG2 - I have not got around
to looking at the data tho.

Keld