Re: Non-Unicode characters, SDATA, etc. from Jon Bosak on 1996-10-23 (w3c-sgml-wg@w3.org from October 1996)

From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
Date: Tue, 22 Oct 1996 21:54:09 -0700
To: w3c-sgml-wg@w3.org
CC: bosak@atlantic-83.Eng.Sun.COM
Message-Id: <199610230454.VAA00513@boethius.eng.sun.com>

[Tim Bray:]

| Anders surprised some of us by pointing out that there are a large
| number of ISO entities that are not in ISO 10646 at all.  So I'd like
| to request input from the WG on this.
| 
| In my personal experience, all the applications I've built and
| delivered, based on SGML, HTML, and what would have been XML if it'd
| been defined, could have lived perfectly happily using just the
| repertoire offered by 10646; the number of non-standard characters was
| so small that doing some extra work to package them up would have been
| a very minor irritation indeed.
| 
| On the other hand, Anders' posting makes it clear that [particularly
| in the area of mathematics] there are routinely a substantial number
| of non-10646 characters available [in theory at least] to technical
| publishers; who have been a mainstay of SGML support over the years.

With some embarrassment (because, like Tim, I have never run into this
problem myself, and therefore argued the 80-20 angle when this was
before the ERB), I must report that Tim's question has suddenly made
an obviously very weak synapse finally fire and retrieved the memory
of some correspondence on this very subject that I had mentally
misfiled in the garage among the old copies of Harper's and the
cabinet full of Canadian barley statistics.  I refer specifically to
an interchange that I had with Nico Poppelier of Elsevier, the
well-known scientific pulishers.  I had ignorantly said something to
the effect that the math and technical symbols in Unicode appeared to
be about all that were needed (which it seems I'm still ignorantly
saying; at least I'm consistent on this), to which he replied:

| I disagree: the set definitely needs extending.  My reference is The
| Unicode Standard (Version 1.0, Volume 1), published by the Unicode
| Consortium. If I look in the range 2100-26FF I see a lot of symbols
| for scientific publishing, but that set is insufficient. A lot of
| symbols are missing. Our working group will compile a more or less
| comprehensive list from the AMS font set, the Elsevier font set, and
| the Mathematica font set, and will compare that against the Unicode
| offering. The Elsevier font set is described, including pictures, in
| the document
| 
|   ftp://ftp.elsevier.nl/pub/sgml/artdoc.ps.gz
| 
| The working group has agreed that scientific and technical publishing
| requires
| 
| 1. all openface lowercase letters, uppercase letters and numerals
|    (Unicode offers only a few commonly used ones)
| 2. all fraktur lowercase letters and uppercase letters
|    (Unicode offers only a few commonly used ones)
| 3. all calligraphic lowercase letters and uppercase letters 
|    (Unicode offers only a few commonly used ones)
| 4. a large set of symbols as described above.

I haven't checked with Mr. Poppelier, but I'm sure that the gentleman
won't mind being quoted in a forum where this information might do
some good.

Jon

Received on Wednesday, 23 October 1996 00:55:53 UTC