- From: <lee@sq.com>
- Date: Thu, 10 Oct 96 01:28:07 EDT
- To: Charles@sgmlsource.com, U35395@UICVM.CC.UIC.EDU
- Cc: w3c-sgml-wg@w3.org
> As for entities, James Clark's suggestion that internal entities be > limited to a > single character is a good one, as it provides a mnemonic capability for > &#Unnnnn. Do you mean a single chracter or a single glyph? For example, is the sequence the latin letter small a combining accent acute one or two characters, or is is 8 characters (2 4-octet bit combinations)? Note that ISO 10646 relies on the use of combining characters to produce accented characters, except for a few special cases that are in ther for historical reasons. If you allow only a single codepoint, can I do <!Entity acute SDATA "[combining accent acute]"> and then use a´ to get an a-acute? This practice is quite common in SGML today, of course, as is the alternative <!Entity a-acute SDATA "[the latin letter small a] . [combining accent acute]"> where . indicates typographical superimposition, not functional composition :-) which leads to many more entity definitions. Note: I have used a-acute, which can be represented in two ways in Unicode and ISO 10646, as there is a pre-combined aacute character for compatibility with ISO 8859-1 (Latin 1). Pretend that there was no such combined character -- e.g. if I had to produce r-acute, and needed to use the combining characters -- as that is the more general case. If you have to handle a character sequence that maps to a single glyph, how much harder is it to allow an arbitrary sequence? It might be worth saying whether entity references can occur within entity values; they could be banned in XML to simplify implementation in environments that lack recursion, such as FORTRAN 66. (are there any FORTRAN 66 SGML-aware Unicode implementations???) Lee
Received on Thursday, 10 October 1996 01:28:23 UTC