Re: ERB discussions and decisions

At 1:38 AM 11/18/96, Rick Jelliffe wrote:
>OK, lets decree that all names should be written in Chinese, since they
>the most numerous people and characters. They have a phonetic system,
>so we can spell out any English words in bopomofo: in the West will only
>have to
>learn a few dozen characters, which is trivial, and we can figure out
>what a name means by looking in some (online) list with representative
>glyphs :-)

Interesting, but not relevant to the problem that I am raising, which is
not to do with the _ordinary_ problems of internationalization, but the
ordinary problem of "nonstandard characters".

>Less flippantly, even simple English names are not clear. It took me a
>long time to realise Americans seem to mean what we call "hash" by
>So I don't have any confidence that English names are much use to most
>people in the world. That is the first reason why names are inferior.

Inferior to "Private-use character #C023"?
That is even worse than "[Pound sign]", which at least gives me a 50%
chance of finding the right character. "[Sumerian Cuneiform GUL.2]" might
bemore  meaningful, too. Even "[Brantaku Lhatala]" is no worse than
"Private use character #C023"

>The second reason is that there are so many characters that giving them
>identifiers that also describe them means some of those identifiers must
>get very complicated, unless you adopt a Polish notation (semi-acronym
>like Microsoft advocates for C code. In which case you don't have a name
>anyway, since you need to know the contraction.  The Omega thread
>shows how deceptive it can be to use meaningful-looking identifiers.

Yes, but a table of even long identifiers at the top of a document is not a
cause for abbreviation (we are discussing entity definitions, after all).
And the point is not that they will _always work_ but that they are _never
worse_ than just a number.

>The third reason is that the way people seem to like to handle lots of
>characters is to use a "Keycaps" utility applet, like Windows and Macs
>(and FrameMaker on UNIX) provide. In this case, the user doesn't care
>what the identifier is. So the main concern is for machine efficiency
>rather than readability, IMHO, since users increasingly won't edit in
>dumb text editors. So both our points may be moot (but my point is
>less moot than yours :-).

We are talking about how a system should represent a character that is not
in Unicode for transmission on the wire. Mathematical symbols are proabably
the best non-scholarly example raised so far in the discussion.

>You mention 'unknown glyphs', and I think you really mean arbitrary
>glyphs not specifiable by ISO10646+markup+stylesheet/DSSSL+locale. I
>been assuming that XML was targetted at 'resolved' (closed-system) data,
>not (open-system) system-independant data.
    Well, then you haven't been reading the many arguments on this topic
very closely.

> In a closed-system,
>of arbitrary foreign glyphs must be a transparent function of the
>not an XML parser function that requires any user's intervention.
>Which sounds to me like something better handled in-place in element
>rather than entity markup, given the rarity of the glyphs, and in the
>spirit of element-set-less simplicity from HTML (or is that spectre?)

Yes, but that only works for a closed vocabulary of glyphs. I don't believe
that that is a closed set, ever has been a closed set in the history of
humanity, or ever will be a closed set at some future date.

   -- David

I am not a number. I am an undefined character.
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________