W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > January 1997

Off topic (Re: questions on XML sgml decl's charsets)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Sun, 26 Jan 1997 18:41:36 +1100
Message-ID: <32EB0AB0.2CBB@allette.com.au>
To: Dave Peterson <davep@acm.org>
CC: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
Dave Peterson wrote:

> The interesting question is:  How do you deal with a legal SGML character
> that your system has no internal representation for?  I haven't thought
> that one through.  In some cases, I suspect your entity manager could
> convert the character as stored in the storage representation into a
> numeric character reference.  But that can break down if the character
> is used in markup.  Hmmm....  ;-)

That was the question that first drove the CJK DOCP's ERCS in 1995. 
Our solution:

* have a public entity set containing all the characters from all 
registered national character sets (helpfully, this was already 
collected in ISO 10646) to allow full interconvertability of 
characters and entity references (i.e. the SPREAD public entity set), 
and therefore document-character-set  -independence;

* make recommendation about which characters should be used in 
native-language markup (i.e. the ERCS "greatest common denominator"
that markup characters only use the intersection set of the characters
sets of all the systems the document will pass through: in Japan,
this would be limited to the characters in Shift JIS, for example:
JIS X 208 but not JIS 212, which still could be used in data);

* make it explicit in the Extended Naming Rules (ENR) that
there are 8 bit systems and there are 16 bit systems, and
a 8-bit systems do not need to handle large-repertoire
markup. The new public identifier in the SGML and system declaration
(the one with "(ENR)" appended, see ISO 8879 Annex J)
is there so that people won't use legal SGML characters
in markup that a system mightn't have an internal representation
for: at least as far as it applies to 8 bit ans 16 bit systems.


Rick Jelliffe               email:  ricko@allette.com.au
Allette Systems (Australia) email:  info@allette.com.au 
Level 10, 91 York Street    www:    http://www.allette.com.au
Sydney 2000 NSW Australia   phone:  +61 2 262 4777
                            fax:    +61 2 262 4774
Received on Sunday, 26 January 1997 02:38:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:07 UTC