Re: those predeclared entity refs from Jon Bosak on 1997-03-26 (w3c-sgml-wg@w3.org from March 1997)

From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
Date: Tue, 25 Mar 1997 22:29:08 -0800
To: w3c-sgml-wg@w3.org
Message-Id: <199703260629.WAA00205@boethius.eng.sun.com>

[Peter Flynn:]

| Remaining question: do we hardwire ISOLat1, ISOLat2, ISOPub and
| ISOTech as being the four most widely-used sets of non-ASCII symbols
| in general?

This is part of the problem.  We went down this road in November and
withdrew in horror when we saw where it led.  (The person who finally
blew the whistle was Anders Berglund, who defined all these ISO sets
in the first place.)  To put it briefly, there is a lot of overlap
among these entity sets, they don't map unambiguously to the Unicode
character set, and they don't map cleanly to existing fonts (in
particular the very widely deployed Adobe-based "symbol" set), in
addition to being 100.00 percent Eurocentric.  This approach leads
into a morass.

So where do you draw the line?  Including all of them is a nightmare,
and including less than that is arbitrary.  Some have said that people
are too used to &lt; to accept a numeric reference.  Well, I have bad
news: a whole lot of people are used to &eacute; and &mdash;, too.

Any case for including predefined entities because it's what people
are used to is a case for including all of them.  And you can't
include all of them because you end up with a stinking mess.

We have decided to go with Unicode precisely in order to get beyond
all this.  Let's get it over with.

Jon

Received on Wednesday, 26 March 1997 01:29:11 UTC