W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > January 1997

Re: questions on XML sgml decl's charsets

From: Dave Peterson <davep@acm.org>
Date: Tue, 21 Jan 1997 09:39:26 -0500
Message-Id: <v01540b01af097a3edf2c@[]>
To: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>, W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
At 6:35 PM 1/13/97, Michael Sperberg-McQueen wrote:

>                               What was overlooked in the heat of the
>moment (this was getting on toward the absolute deadline) was that the
>machine in question was an ASCII machine, not a Unicode machine; of
>course the 646 declaration was required for processing.  But the
>canonical SGML declaration should use Unicode.

If all XML uses the same SGML declaration, and that declaration specifies
ISO 10646, then all XML machines must be ISO 10646 machines.  Or are
we planning to allow different XML documents to have different SGML
character sets?

Note that specifying Unicode as the "document character set" (1) specifies
what character numbers in numerical character references are to be
interpreted as what characters, and (2) specifies what characters (however
represented in the system character representation) are legal in the

The interesting question is:  How do you deal with a legal SGML character
that your system has no internal representation for?  I haven't thought
that one through.  In some cases, I suspect your entity manager could
convert the character as stored in the storage representation into a
numeric character reference.  But that can break down if the character
is used in markup.  Hmmm....  ;-)

Dave Peterson

Received on Tuesday, 21 January 1997 09:41:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:06 UTC