- From: Paul Grosso <paul@arbortext.com>
- Date: Mon, 13 Jan 97 12:05:51 CST
- To: w3c-sgml-wg@www10.w3.org
Looking at the SGML decl in the appendix of the XML draft, I have
a few questions. (I'll admit up front that if there is anything
that boggles my mind more than BOSs, it's character sets.)
The decl therein seems to define a document baseset from 0 to 255
and a syntax baseset from 646 (isn't that from 0 to 255?).
Anyway, even though I cannot figure out document from syntax basesets
from charsets from descsets, it doesn't look like that SGML decl
allows characters above 255, so where does unicode come it? Am
I getting confused between encodings and character sets again or what?
In the syntax, I see we shun no characters whereas most sgml decls
shun some set of control characters at least. What's the rationale
here for no shun characters?
The syntax baseset refers to some 1983 version of 646. I note that
the recent "I18n-ization of HTML" RFC 2070 talks changing the HTML 2.0
declaration's syntax character set declaration:
Another change was made from the HTML 2.0 SGML declaration, in the
belief that the latter did not express its authors' true intent. The
syntax character set declaration was changed from ISO 646.IRV:1983 to
the newer ISO 646.IRV:1991, the latter, but not the former, being
identical with US-ASCII.
That document also shows a baseset/descset declaration of:
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation level 3
//ESC 2/5 2/15 4/6"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
160 2147483486 160
which is UCS-4. I gather from other sources that a UCS-2 declaration
might look something like:
BASESET "ISO Registration Number 176//CHARSET
ISO/IEC 10646-1:1993 UCS-2 with implementation level 3//ESC 2/5 2/15 4/5"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 32
160 65376 160
but that's not what I see in the XML draft.
And how do all this square with the ERCS stuff WG8 recently approved?
Received on Monday, 13 January 1997 13:11:52 UTC