- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Thu, 21 Nov 1996 18:54:47 +1100
- To: "David G. Durand" <dgd@cs.bu.edu>
- CC: W3C SGML Working Group <w3c-sgml-wg@w3.org>
David G. Durand wrote: > We are talking about how a system should represent a character that is not > in Unicode for transmission on the wire. Mathematical symbols are proabably > the best non-scholarly example raised so far in the discussion. Another good example might be place and personal names of Japanese and Taiwanese: these often cannot be written using characters in computer character sets, and must be spelled out in some alternative form. Imagine not even being able to give your name using a computer! For CJK characters it has been proposed (Prof. Eiji Matsuoka) that the standard national character book (E.g. in Japan the Daiwa Kanten) be used: these are collections with all known characters (or at, at least, glyphs) and national variants. So the character is identified by a string, either in the comment string or in the SDATA entity text value, giving the source book and an index into it, e.g. "Daiwa Kanten, character 40000", or "Daiwa Kanten, 3rd ed., p24, char 2" I guess. Other suggested methods are the telphone method of saying a well-known character that the character looks like, then giving the differences, or building the character up from radicals. (And there is also the encoded bitmap idea, to actually send a rough sample of the glyph.) These let the sucker at the receiving end reconstruct the character. (As I have said before, I think a preferable method is that whoever makes a document undertakes to put some kind of usable glyph on the web too, when some kind of WWW font service mechanism is established.) I am not objecting to nickname identifiers, expecially for specialists sending material to each other. That is fine for documents with occasional strange characters, and for logos and symbols. But document series that have tens of thousands of non-Unicode characters (e.g. the EBTI projects going on in CJK and Thailand) either the identifier or the SDATA text values have to have something that can be directly used to index large character tables: numbers are directly useful for this. Mapping the ISO entities sets to system values is tedious enough, let alone document-specific character entity sets. Anyway, I think you ask if ever (unnumbered) nicknames are not preferable to identifiers with numbers. I think there are many such, certainly in documents outside the Western hemisphere. -- Regards Rick Jelliffe email: ricko@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 262 4777 fax: +61 2 262 4774 _______________________________________________________________
Received on Thursday, 21 November 1996 02:51:40 UTC