Non-Unicode characters, SDATA, etc. from Tim Bray on 1996-10-22 (w3c-sgml-wg@w3.org from October 1996)

From: Tim Bray <tbray@textuality.com>
Date: Tue, 22 Oct 1996 14:50:03 -0700
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <3.0b33.32.19961022143452.0073b8d8@pop.intergate.bc.ca>

It has been argued convincingly on this group that the use of SDATA entities
provides a high-value way of including some extra information for a receiving
application, to the effect that this is more than just a private-area
Unicode number.

My position, and I'm not speaking for the ERB but I suspect that they would
agree, has been that XML should be happy to do a good job dealing with what 
Unicode already defines, and accept that particular 80-20 tradeoff, and 
perhaps fail to be the ideal tool for those who deal with particularly
exotic species of scripture.

Anders surprised some of us by pointing out that there are a large number of
ISO entities that are not in ISO 10646 at all.  So I'd like to request input
from the WG on this.  

In my personal experience, all the applications I've built and delivered, 
based on SGML, HTML, and what would have been XML if it'd been defined, could 
have lived perfectly happily using just the repertoire offered by 10646; the 
number of non-standard characters was so small that doing some extra work to 
package them up would have been a very minor irritation indeed.

On the other hand, Anders' posting makes it clear that [particularly in
the area of mathematics] there are routinely a substantial number of 
non-10646 characters available [in theory at least] to technical publishers;
who have been a mainstay of SGML support over the years.

So: what applications are going to become severely XML-impaired
if we lack the nice out-of-band SDATA signaling mechanism?  Concrete 
examples would be the most valuable input.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Tuesday, 22 October 1996 17:50:46 UTC