- From: Glenn Adams <glenn@stonehand.com>
- Date: Wed, 26 Apr 95 20:44:10 -0400
- To: philipp@res.enst.fr
- Cc: Multiple recipients of list <www-html@www10.w3.org>
There's a movement afoot to make ISO/IEC 10646-1:1993 the standard document character set for HTML. This can be done without affecting existing implementations because: 1. The ISO 8859-1 character repertoire is a subset of 10646 and the code assignments in 10646 is the same as 8859-1 for this repertoire; that is, � through ÿ denote the same characters in both 8859-1 and 10646. 2. SGML (and thus HTML) doesn't require that the representation of entities (e.g., the document entity) must use the document character set; that is, one can use 8859-1 or ASCII or Shift JIS or any other character set in the actual representation of a document. The entity manager is responsible for translating the actual representation of the entity into a form understood by the parser in terms of the applicable document character set. [This translation is partially supported by the CHARSET= parameter on the CONTENT-TYPE header in HTTP: this parameter identifies the actual encoding of the entity's representation.] --------- As for ligatures, one needs to be a bit careful about terminology here. Some ligatures, such as 'ffi' are merely presentation forms that enhance the aesthetics of rendered text; other 'ligatures', the so-called 'lexical ligatures' communicate additional lexical information beyond a mere presentational style. Then again, what is a lexical ligature to one writing system may be a presentational ligature to another. Not all purely presentational ligatures are encoded (or should be encoded) in 10646 or any other character set. Those that are encoded are merely aiding in compatibility with older software that can't distinguish between characters and glyphs. A general formatting architecture must account for the need of mapping characters to glyphs in a many-to-many mapping; this is particular true for the less simple scripts such as Arabic, Devanagari, etc. Regards, Glenn Adams
Received on Wednesday, 26 April 1995 20:45:04 UTC