Re: XML Blueberry (non-ASCII name characters in Japan)

In Unicode 3.1 there are added special function characters for allowing new
characters to be composed positionally from parts.  These are intended for
very rare or new characters only.

There has been several thousand of years of research into what the primitive
components of Han ideographs are.  It is only now that we have computers and
large databases of characters that it is feasible to try out different
alternatives.  At Academia Sinica, for example, my friend Prof. C.C. Hsieh
devised a system with about 600 components and I think 16 composition
functions (side-by-side) which can represent about 98% of the Hanyu lexicon.

Unicode went with a simpler set of functions, but at the expense that the
functions allow some ambiguity: there may be more than one way to represent
the same character.  This may be fine for text, but not good for names where
normalization and comparison is their destiny.

(I don't think these function characters are suitable for use in names,
b.t.w.)

Cheers
Rick Jelliffe



From: "Joel Rees" <rees@server.mediafusion.co.jp>

Oh. I thought of another way around this issue. It is not presently a very
satisfying solution, but may be the ultimate solution, if it would work: Are
ideographic sequences allowed in markup (tags and attributes)? I mean
sequences of existing characters with the ideographic description characters
mixed in to show how they are supposed to combine. If so, some truly
sophisticated editor of the future would be able to build virtually any
character that can be built from the current set of radicals, and we would
be able to do with Japanese the equivalent of using "mellyfluus" (the
misspelling) in an attribute.

Received on Tuesday, 10 July 2001 06:11:24 UTC