[Prev][Next][Index][Thread]

Re: questions on XML sgml decl's charsets



>If all XML uses the same SGML declaration, and that declaration specifies
>ISO 10646, then all XML machines must be ISO 10646 machines.  Or are
>we planning to allow different XML documents to have different SGML
>character sets?

The character repertiore is fixed, though we are allowing different
coded encoding as input, which are to be translated using the
appropriate BCTF/decoder into a set of bit combinations.

>Note that specifying Unicode as the "document character set" (1) specifies
>what character numbers in numerical character references are to be
>interpreted as what characters, and (2) specifies what characters (however
>represented in the system character representation) are legal in the
>document.

Yes.

>The interesting question is:  How do you deal with a legal SGML character
>that your system has no internal representation for?  I haven't thought
>that one through.  In some cases, I suspect your entity manager could
>convert the character as stored in the storage representation into a
>numeric character reference.  But that can break down if the character
>is used in markup.  Hmmm....  ;-)

SGML is silent on this issue, I believe (we went through this for
HTML, and I invoked this silence as a way for saying older browsers
would be conformant with the HTML I18N draft).

I think we could make this a reportable error (not a fatal error).

A more interesting question is what to do with characters that do not
appear within the character repertoire we have defined...


References: