[Prev][Next][Index][Thread]

Re: questions on XML sgml decl's charsets



On Mon, 13 Jan 1997 13:12:24 -0500 Paul Grosso said:
>Looking at the SGML decl in the appendix of the XML draft, I have
>a few questions.  ...
>The decl therein seems to define a document baseset from 0 to 255
>and a syntax baseset from 646 (isn't that from 0 to 255?).
>
>Anyway, even though I cannot figure out document from syntax basesets
>from charsets from descsets, it doesn't look like that SGML decl
>allows characters above 255, so where does unicode come it?  Am
>I getting confused between encodings and character sets again or what?

No, you're reading it right.  There was an editorial error in the
final preparation of the copy, and the wrong declaration got in.

Actually, two declarations were prepared, one which referred to 10646
and defined a wide character set using ERCS syntax (or so I understand
-- haven't seen it, myself), and another one which referred to ISO 646.
The text originally contained the 10646 declaration, but during final
preparation it became clear that the *other* declaration had to be used
in processing. Between the declaration that worked, and another that did
not work, there seemed to be no seriously debatable choice, and so the
646 declaration was inserted.  What was overlooked in the heat of the
moment (this was getting on toward the absolute deadline) was that the
machine in question was an ASCII machine, not a Unicode machine; of
course the 646 declaration was required for processing.  But the
canonical SGML declaration should use Unicode.

I thought this was reasonably widely mentioned at SGML '96 and
elsewhere, but perhaps it wasn't noted widely enough.  Tim and I
really need to get a revision up on the Web sites, with all the
corrections people have been sending recently (keep'em coming!)

-C. M. Sperberg-McQueen


Follow-Ups: