W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > November 1996

Re: ERB discussions and decisions

From: Rick Jelliffe <ricko@allette.com.au>
Date: Mon, 18 Nov 1996 01:38:52 +1100
Message-ID: <328F237C.3EB3@allette.com.au>
To: "David G. Durand" <dgd@cs.bu.edu>
CC: W3C SGML Working Group <w3c-sgml-wg@w3.org>
David G. Durand wrote:
>     I have yet to hear anyone offer an argument as to why a character
> string describing missing data (an unknown glyph) is inferior to a number
> describing missing data, especially when the web infrastructure does not
> provide a convenient way to make the private arrangements needed for
> private-use to work (funny property of a publishing medium, isn't it?).
OK, lets decree that all names should be written in Chinese, since they
the most numerous people and characters. They have a phonetic system, 
so we can spell out any English words in bopomofo: in the West will only
have to 
learn a few dozen characters, which is trivial, and we can figure out 
what a name means by looking in some (online) list with representative
glyphs :-)

Less flippantly, even simple English names are not clear. It took me a 
long time to realise Americans seem to mean what we call "hash" by
So I don't have any confidence that English names are much use to most
people in the world. That is the first reason why names are inferior.

The second reason is that there are so many characters that giving them
identifiers that also describe them means some of those identifiers must
get very complicated, unless you adopt a Polish notation (semi-acronym
like Microsoft advocates for C code. In which case you don't have a name
anyway, since you need to know the contraction.  The Omega thread
shows how deceptive it can be to use meaningful-looking identifiers.

The third reason is that the way people seem to like to handle lots of
characters is to use a "Keycaps" utility applet, like Windows and Macs
(and FrameMaker on UNIX) provide. In this case, the user doesn't care
what the identifier is. So the main concern is for machine efficiency
rather than readability, IMHO, since users increasingly won't edit in
dumb text editors. So both our points may be moot (but my point is 
less moot than yours :-). 
You mention 'unknown glyphs', and I think you really mean arbitrary
glyphs not specifiable by ISO10646+markup+stylesheet/DSSSL+locale. I
been assuming that XML was targetted at 'resolved' (closed-system) data, 
not (open-system) system-independant data.  In a closed-system,
of arbitrary foreign glyphs must be a transparent function of the
not an XML parser function that requires any user's intervention. 
Which sounds to me like something better handled in-place in element
rather than entity markup, given the rarity of the glyphs, and in the 
spirit of element-set-less simplicity from HTML (or is that spectre?)


Rick Jelliffe               email:  ricko@allette.com.au
Allette Systems (Australia) email:  info@allette.com.au 
Level 10, 91 York Street    www:    http://www.allette.com.au
Sydney 2000 NSW Australia   phone:  +61 2 262 4777
                            fax:    +61 2 262 4774
Received on Sunday, 17 November 1996 09:35:35 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:20 UTC