- From: Anders Berglund <alb@ebt.com>
- Date: Mon, 21 Oct 96 12:01:24 EDT
- To: w3c-sgml-wg@w3.org
- Cc: alb@ebt-inc.ebt.com
>-C. M. Sperberg-McQueen writes: > >If I am reading the postings from David Durand, Lee Quin, and John >Lavagnino correctly, it seems that my earlier note embodied a large >misconception. I had asked for references to 8879 to explain the >behavior they seem to be associating with SDATA entities; they >seem, if I understand correctly, to be unanimous in saying that >8879 does not in fact define the desired behavior at all, but that >it's widely implemented and well understood (and, I assume, should >be incorporated into XML 1.0). > >So what seems to be at issue is *not* whether XML 1.0 should retain >the SGML 'SDATA' keyword, but whether it should retain that keyword >and prescribe some particular behavior for processors encountering >it. > >Here's a confession that if this behavior is widely implemented, I >haven't seen it in the software I use most frequently. Perhaps I >just don't read the manuals carefully enough, but the only program >that comes to mind as describing any behavior like this is Panorama, >and its functionality (rendering correctly all the characters in the >ISO entity sets for which the local system has font resources) can >be done with character references to ISO 10646, since all the >characters in the ISO entity sets are in 10646 (some glyphs >representing ligatures are missing). This is unfortunately a totally incorrect statement. The ISO entity sets contain a LARGE number of entities for which there are NO CHARACTERS in 10646. The ISO entity sets for maths and science (currect versions in 9573-13 published in '92) have several hundred entities that are not in 10646. The current working draft of 9573-15 (entities for non-latin alphabets - which includes a number of sets for scholarly usage) has an entity repertoire where maybe half are in 10646. Revision of 10646 is progressing very slowly - ISO/IEC JTC1/SC18 requested the addition of the characters for maths and science from 9573-13 on both DIS 1 and 2 (4 years ago) and so far there is no sign of any addition. Some standards bodies have suggested creating a "competitor" standard in ECMA and then "fast-tracking" that as additions to 10646. Thus I belive that it is higlhly important that XML includes SDATA entities and in a way that is decoupled from 10646. Use of "private use" areas in 10646 I believe will lead to the same mess as eg IBM ended up with EBCDIC. At the recent SGML Asia Pacific conference it was very interesting to hear the complaints from "CJK" representatives on 10646. >As to the other proposition, that it is widely understood, I can >only say that if it is widely understood, then surely someone can be >found to describe the behavior on this list explicitly enough that >the rest of us can also understand it. I for one do not understand >the behavior, and would really like a description, preferably with >reference to some documentation. In DSSSL the following approach was taken, based on the observation that most applications/implementations use common NAMES for the SDATA entities (mainly those in the ISO entity sets) and either totally ignore the SDATA replacement text (eg FrameMaker SGML) or have some system specific syntax in the replacement text (eg DynaText) in the implementation specific files that use the common NAMES: - a DSSSL character is named and you have an infinitely large name space. A character is decoupled from the encoding BUT some specification shortcuts are possible for those characters that are in 10646 (especially useful for the character properties) - a coded input character (eg ISO 8859-x, ISO/IEC 10646, EBCDIC) is mapped to a DSSSL character - an SDATA entity is mapped to a DSSSL character, primarily based on the entity name (or on the replacement text for systems where the entity name is not available) - a DSSSL character is mapped to a glyph identifier This approach has the following advantages: - decoupling from the input character code - you may mix and match input encodings - decoupling from existing coding standards that are ALL inadequate - infinite character repertoire - very important for publishing and scholarly applications One should note that the DSSSL approach is "equivalent" to mapping an input character in a particular encoding to a particular glyph in a font. Anders Berglund Editor ISO/IEC 9573 Techniques for Using SGML where the (revised) ISO entity sets are included Member of the ISO/IEC JTC1/SC18/WG8 RG on DSSSL
Received on Monday, 21 October 1996 12:03:57 UTC