W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > November 1996

Re: ERB discussions and decisions

From: Rick Jelliffe <ricko@allette.com.au>
Date: Thu, 21 Nov 1996 18:54:47 +1100
Message-ID: <32940AC7.2769@allette.com.au>
To: "David G. Durand" <dgd@cs.bu.edu>
CC: W3C SGML Working Group <w3c-sgml-wg@w3.org>
David G. Durand wrote:

> We are talking about how a system should represent a character that is not
> in Unicode for transmission on the wire. Mathematical symbols are proabably
> the best non-scholarly example raised so far in the discussion.

Another good example might be place and personal names of Japanese and
Taiwanese: 
these often cannot be written using characters in computer character
sets, 
and must be spelled out in some alternative form. Imagine not even being
able to
give your name using a computer! 

For CJK characters it has been proposed (Prof. Eiji Matsuoka) that 
the standard national character book (E.g. in Japan the Daiwa Kanten) 
be used: these are collections with all known characters (or at, at
least, 
glyphs) and national variants.

So the character is identified by a string, either in the comment string
or
in the SDATA entity text value, giving the source book and an index into
it, e.g. "Daiwa Kanten, character 40000", or "Daiwa Kanten, 3rd ed.,
p24, char 2"
I guess.  Other suggested methods are the telphone method of saying a 
well-known character that the character looks like, then giving the
differences,
or building the character up from radicals. (And there is also the
encoded bitmap
idea, to actually send a rough sample of the glyph.)   These let the
sucker
at the receiving end reconstruct the character. (As I have said before,
I
think a preferable method is that whoever makes a document undertakes to
put some kind of usable glyph on the web too, when some kind of WWW font
service mechanism is established.)

I am not objecting to nickname identifiers, expecially for
specialists sending material to each other. That is fine for documents
with occasional strange characters, and for logos and symbols.  
But document series that have tens of thousands of non-Unicode
characters 
(e.g. the EBTI projects going on in CJK and Thailand) either the
identifier 
or the SDATA text values have to have something that can be directly
used 
to index large character tables: numbers are directly useful for this.
Mapping the ISO entities sets to system values is tedious enough, let
alone document-specific character entity sets.

Anyway, I think you ask if ever (unnumbered) nicknames 
are not preferable to identifiers with numbers.  I think there are
many such, certainly in documents outside the Western hemisphere.


-- 
Regards

Rick Jelliffe               email:  ricko@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email:  info@allette.com.au 
Level 10, 91 York Street    www:    http://www.allette.com.au
Sydney 2000 NSW Australia   phone:  +61 2 262 4777
                            fax:    +61 2 262 4774
_______________________________________________________________
Received on Thursday, 21 November 1996 02:51:40 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:43 EDT