Re: questions on XML sgml decl's charsets

At 6:35 PM 1/13/97, Michael Sperberg-McQueen wrote:

>                               What was overlooked in the heat of the
>moment (this was getting on toward the absolute deadline) was that the
>machine in question was an ASCII machine, not a Unicode machine; of
>course the 646 declaration was required for processing.  But the
>canonical SGML declaration should use Unicode.

If all XML uses the same SGML declaration, and that declaration specifies
ISO 10646, then all XML machines must be ISO 10646 machines.  Or are
we planning to allow different XML documents to have different SGML
character sets?

Note that specifying Unicode as the "document character set" (1) specifies
what character numbers in numerical character references are to be
interpreted as what characters, and (2) specifies what characters (however
represented in the system character representation) are legal in the
document.

The interesting question is:  How do you deal with a legal SGML character
that your system has no internal representation for?  I haven't thought
that one through.  In some cases, I suspect your entity manager could
convert the character as stored in the storage representation into a
numeric character reference.  But that can break down if the character
is used in markup.  Hmmm....  ;-)

Dave Peterson
SGMLWorks!

davep@acm.org

Received on Tuesday, 21 January 1997 09:41:18 UTC