- From: David G. Durand <dgd@cs.bu.edu>
- Date: Sun, 17 Nov 1996 14:11:18 -0500
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
At 1:38 AM 11/18/96, Rick Jelliffe wrote: > >OK, lets decree that all names should be written in Chinese, since they >are >the most numerous people and characters. They have a phonetic system, >so we can spell out any English words in bopomofo: in the West will only >have to >learn a few dozen characters, which is trivial, and we can figure out >what a name means by looking in some (online) list with representative >glyphs :-) Interesting, but not relevant to the problem that I am raising, which is not to do with the _ordinary_ problems of internationalization, but the ordinary problem of "nonstandard characters". >Less flippantly, even simple English names are not clear. It took me a >long time to realise Americans seem to mean what we call "hash" by >"pound". >So I don't have any confidence that English names are much use to most >people in the world. That is the first reason why names are inferior. Inferior to "Private-use character #C023"? That is even worse than "[Pound sign]", which at least gives me a 50% chance of finding the right character. "[Sumerian Cuneiform GUL.2]" might bemore meaningful, too. Even "[Brantaku Lhatala]" is no worse than "Private use character #C023" >The second reason is that there are so many characters that giving them >identifiers that also describe them means some of those identifiers must >get very complicated, unless you adopt a Polish notation (semi-acronym >contractions) >like Microsoft advocates for C code. In which case you don't have a name >anyway, since you need to know the contraction. The Omega thread >earlier >shows how deceptive it can be to use meaningful-looking identifiers. Yes, but a table of even long identifiers at the top of a document is not a cause for abbreviation (we are discussing entity definitions, after all). And the point is not that they will _always work_ but that they are _never worse_ than just a number. >The third reason is that the way people seem to like to handle lots of >characters is to use a "Keycaps" utility applet, like Windows and Macs >(and FrameMaker on UNIX) provide. In this case, the user doesn't care >what the identifier is. So the main concern is for machine efficiency >rather than readability, IMHO, since users increasingly won't edit in >dumb text editors. So both our points may be moot (but my point is >less moot than yours :-). We are talking about how a system should represent a character that is not in Unicode for transmission on the wire. Mathematical symbols are proabably the best non-scholarly example raised so far in the discussion. >You mention 'unknown glyphs', and I think you really mean arbitrary >glyphs not specifiable by ISO10646+markup+stylesheet/DSSSL+locale. I >have >been assuming that XML was targetted at 'resolved' (closed-system) data, >not (open-system) system-independant data. Well, then you haven't been reading the many arguments on this topic very closely. > In a closed-system, >resolution >of arbitrary foreign glyphs must be a transparent function of the >application, >not an XML parser function that requires any user's intervention. >Which sounds to me like something better handled in-place in element >markup >rather than entity markup, given the rarity of the glyphs, and in the >spirit of element-set-less simplicity from HTML (or is that spectre?) Yes, but that only works for a closed vocabulary of glyphs. I don't believe that that is a closed set, ever has been a closed set in the history of humanity, or ever will be a closed set at some future date. -- David I am not a number. I am an undefined character. _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Sunday, 17 November 1996 14:05:53 UTC