W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > January 1997

Re: questions on XML sgml decl's charsets

From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Date: Tue, 21 Jan 97 11:36:43 CST
Message-Id: <199701211747.MAA18627@www10.w3.org>
To: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
On Tue, 21 Jan 1997 09:39:26 -0500 Dave Peterson said:
>If all XML uses the same SGML declaration, and that declaration
>specifies ISO 10646, then all XML machines must be ISO 10646 machines.
>Or are we planning to allow different XML documents to have different
>SGML character sets?

The spec says "The specific SGML declaration needed to enable SGML
systems to process XML documents will vary from document to document"
depending on the character encoding of the document's entities.  It
may also vary with the SGML system's understanding and implementation
of 8879's rules for character handling, which seem to give rise to
wildly varying interpretations.

>The interesting question is:  How do you deal with a legal SGML character
>that your system has no internal representation for?  I haven't thought
>that one through.  In some cases, I suspect your entity manager could
>convert the character as stored in the storage representation into a
>numeric character reference.  But that can break down if the character
>is used in markup.  Hmmm....  ;-)

Systems with no internal representation for strings of 16 and 32 bits
would appear not to be capable of handling XML.  Are there any such
systems?  N.B. whether they think the 16-bit string is a 'character'
is a separate issue.  Internal representation just means you can read
it, store it, and write it without information loss.  It does not
mean you can display or print it.  The unicode standard has what
seems to me a good discussion of fallback behaviors in such

-C. M. Sperberg-McQueen
Received on Tuesday, 21 January 1997 12:47:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:06 UTC