- From: Borka Jerman-Blazic <jerman-blazic@ijs.si>
- Date: Mon, 25 Oct 1993 12:09:18 +0100
- To: ietf-charsets <ietf-charsets@INNOSOFT.COM>
- Cc: ietf-wnils <ietf-wnils@UCDAVIS.edu>
===================================================================== >>It seems to me that English and Greek characters need separate code points >>because their visual appearance is significantly different, not because >>they are from different languages. >Actually, Dan, a lot of other issues aside, you have hit on one of the >critical issues here. Ohta-san has responded on this, but let me try a >bit of a generalization. I would say that they have different code points because they belong to different scripts!. The same apply to Cyrillic. You can not mixed in some text related operation (I have in mind: ordering) both scripts becuase they have set elements with exactly the same "shape" i.e A, K, P, C etc. but different names (meaning different interpretation, diferent pronunciation) because they belong to different scripts. You can order all latin characters from ISO 10 646 in one collation string for many different languages (it is difficult because the ordering rules differ from language to language, some already done work is around) but you can not mixed them with Greek or Cyrillic. I am not expert in ideograms but I guess that the problems they have with different "shape"s of the same ideogram (which can somehow be related to the problems of glyphs and characters to our -western understanding) is that they belong to the same script- called ideographic. Of, course that does not mean that they have to be coded as they are now in 10 646 but somehow they belong together. >There are two issues that might usefully be thought of as separate: >(1) "visual appearance is significantly different" is largely in the eye >of the beholder. Is the Latin lower-case "a" the same, or >"significantly different" from Greek lower-case alpha? Be careful about >the answer, because it may be different in different fonts, and >typography is supposed to not be an issue here. I agree completly. Glyphs and typography is not related issue here. Characters in one coded character set are supposed to be unique i.e one character is coded only once in one character set table. >(2) To the degree that there are *any* letter-symbols that we can agree >are not "significantly different" in Greek and Latin character sets >(let's stick with alpha and look at its upper-case form as Ohta-san >did), one then can make a choice between--starting from a traditionally >ASCII-based world-- "ASCII characters with Greek supplement" and >"separate contiguous code points for basic Latin and Greek characters". >The former creates a smaller number of total codes because, e.g., Greek >upper-case alpha does not get assigned a code point separate from Latin >upper-case A. The latter preserves some collating integrity, some >useful relationships between, e.g. upper case and lower case character >sets, and maybe has some cultural merit (which moves dangerously close >to "because they are different languages"). But the latter yields much >larger total character sets, because similar symbols are assigned to >separate code points under some set of rules. That issue was discussed for so many years and today will be difficult to change the generality adopted by many bodies i.e sets of characters are coded and the members of these sets are supposed to be unique in the set itself. The problem could be maybe better addressed if we speak about scripts and not languages. >The "ASCII with supplemental Greek characters" approach is known in the >character set community as "unification". One of the several >objections to IS 10646 and UNICODE in the Asian character set community >is that North American and European-dominated committees and design >teams were a lot more willing to "unify" characters deriving from >Chinese ("Han") characters than they were to unify characters deriving >from, e.g., Greek or North Semitic. This issue was already discussed. Why Chinese han was chosen for unification I do not know but at the SC2 meeting in Rennes it was presented as a consensus of the three national bodies i.e China, Japan and Korea. However, I know that soon some new proposal will be discussed on the Washington meeting of SC2 WG2 (next week) which will allow allocation for aditional blocks in some part of the BMP (i.e use of the reserved allocations for sort of announcement and then invocation of the blocks from the second plane). Please, do not discuss this further because it is not official and I do not have the original document! Borka p.s but we agreed what are the problems to be solved over Internet, did we?? --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Monday, 25 October 1993 05:11:31 UTC