- From: Paul Grosso <paul@arbortext.com>
- Date: Mon, 13 Jan 97 12:05:51 CST
- To: w3c-sgml-wg@www10.w3.org
Looking at the SGML decl in the appendix of the XML draft, I have a few questions. (I'll admit up front that if there is anything that boggles my mind more than BOSs, it's character sets.) The decl therein seems to define a document baseset from 0 to 255 and a syntax baseset from 646 (isn't that from 0 to 255?). Anyway, even though I cannot figure out document from syntax basesets from charsets from descsets, it doesn't look like that SGML decl allows characters above 255, so where does unicode come it? Am I getting confused between encodings and character sets again or what? In the syntax, I see we shun no characters whereas most sgml decls shun some set of control characters at least. What's the rationale here for no shun characters? The syntax baseset refers to some 1983 version of 646. I note that the recent "I18n-ization of HTML" RFC 2070 talks changing the HTML 2.0 declaration's syntax character set declaration: Another change was made from the HTML 2.0 SGML declaration, in the belief that the latter did not express its authors' true intent. The syntax character set declaration was changed from ISO 646.IRV:1983 to the newer ISO 646.IRV:1991, the latter, but not the former, being identical with US-ASCII. That document also shows a baseset/descset declaration of: BASESET "ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3 //ESC 2/5 2/15 4/6" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 32 UNUSED 160 2147483486 160 which is UCS-4. I gather from other sources that a UCS-2 declaration might look something like: BASESET "ISO Registration Number 176//CHARSET ISO/IEC 10646-1:1993 UCS-2 with implementation level 3//ESC 2/5 2/15 4/5" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 32 32 160 65376 160 but that's not what I see in the XML draft. And how do all this square with the ERCS stuff WG8 recently approved?
Received on Monday, 13 January 1997 13:11:52 UTC