W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > January 1997

Re: questions on XML sgml decl's charsets

From: Gavin Nicol <gtn@ebt.com>
Date: Tue, 21 Jan 1997 13:36:42 -0500
Message-Id: <199701211836.NAA02544@nathaniel.ebt>
To: davep@acm.org
CC: U35395@UICVM.UIC.EDU, w3c-sgml-wg@www10.w3.org
>If all XML uses the same SGML declaration, and that declaration specifies
>ISO 10646, then all XML machines must be ISO 10646 machines.  Or are
>we planning to allow different XML documents to have different SGML
>character sets?

The character repertiore is fixed, though we are allowing different
coded encoding as input, which are to be translated using the
appropriate BCTF/decoder into a set of bit combinations.

>Note that specifying Unicode as the "document character set" (1) specifies
>what character numbers in numerical character references are to be
>interpreted as what characters, and (2) specifies what characters (however
>represented in the system character representation) are legal in the
>document.

Yes.

>The interesting question is:  How do you deal with a legal SGML character
>that your system has no internal representation for?  I haven't thought
>that one through.  In some cases, I suspect your entity manager could
>convert the character as stored in the storage representation into a
>numeric character reference.  But that can break down if the character
>is used in markup.  Hmmm....  ;-)

SGML is silent on this issue, I believe (we went through this for
HTML, and I invoked this silence as a way for saying older browsers
would be conformant with the HTML I18N draft).

I think we could make this a reportable error (not a fatal error).

A more interesting question is what to do with characters that do not
appear within the character repertoire we have defined...
Received on Tuesday, 21 January 1997 13:38:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:06 UTC