- From: Gavin Nicol <gtn@ebt.com>
- Date: Wed, 22 Jan 1997 13:52:34 -0500
- To: dgd@cs.bu.edu
- CC: w3c-sgml-wg@www10.w3.org
>>I consider this a serious flaw in the spec. > >I think it's kind of unavoidable, since SGML character handling is such a >mess (in practice) that I doubt we could find any 16 bit solution that >would work on all systems -- possibly even any single declaration that >would work on all systems. The elegant idea of a declaration as a document >specific data specification has long been replaced by the ugly practice of >the declaration as dependent on both processor and input, in my experience. I should note that for HTML I18N (somewhat before BCTF) I took the view that the document character set defined the *characters* that were available to the parser, not the character *numbers* (ie. the character numbers were nothing but a shorthand name for the actual character). This works out well, because it doesn't constrain the parser implementation by requiring that it represent things with bit combinations of a given width. ><soapbox>Of course, this is a new way to see the basic point that dealing >with character numbers at all is inherently fragile. We could solve this >problem as well by structured use of SDATA (as we have made structured use >of PI).</soapbox> Part of the global glyph/character repositiory idea I have is basically just that: you refer to character by *name* rather than number. >I'm being clueless again, but isn't there a way out for UTF-8 encoded >files I think it's important not to confuse encodings and coded character sets. You should be able to have any encoding you like as input, and all numeric character references should still refer to the same character. For example, if you have ㌽ (SQUARE POINTO)in your UTF8 document, and you do a blinf encoding conversion to shift-jis, then that numeric character reference should still refer to SQUARE POINTO.
Received on Wednesday, 22 January 1997 13:54:23 UTC