[Prev][Next][Index][Thread]

Re: A character is in the eye of the beholder




>> The simplest transformation is to decompose all composites and sort all
>> combining characters that attach to a single base character in binary
>> order. This guarantees a unique and permanent canonical representation.
>> An alternative is to replace all those combinations that are defined
>> with the composite. This transformation is dependent on the version of
>> the standard, since new composite characters are being discovered from
>> time to time, but is still satisfactory.
>
>The problem with this is that the standard sorting specifications are
>done on the whole characters, not the "decomposed" composite
>sequences. Also for that reason it would be advantegous to code
>the information in the 10646 characters so you have support for
>sorting.  Building on the 10646 standard allows you to draw on
>all other ISO standardized work building on the standard, and thus
>to have an aligned set of standard conforming specifications.

This doesn't work either as some languages require accented characters to be
placed at the end of the list. CEN TC304 are working on a set of sorting
rules for ISO 10646, which i18n should adopt as soon as ready for European
languages, but the sorting problems of CJK will need to be met by other
means as the same glyph can mean different things in different
contexts/languages.
----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.u-net.com/~sgml/