- From: <lee@sq.com>
- Date: Thu, 10 Oct 96 01:28:07 EDT
- To: Charles@sgmlsource.com, U35395@UICVM.CC.UIC.EDU
- Cc: w3c-sgml-wg@w3.org
> As for entities, James Clark's suggestion that internal entities be
> limited to a
> single character is a good one, as it provides a mnemonic capability for
> &#Unnnnn.
Do you mean a single chracter or a single glyph?
For example, is the sequence
the latin letter small a
combining accent acute
one or two characters, or is is 8 characters (2 4-octet bit combinations)?
Note that ISO 10646 relies on the use of combining characters to
produce accented characters, except for a few special cases that are
in ther for historical reasons.
If you allow only a single codepoint, can I do
<!Entity acute SDATA "[combining accent acute]">
and then use a´ to get an a-acute? This practice is quite common
in SGML today, of course, as is the alternative
<!Entity a-acute SDATA "[the latin letter small a] . [combining accent acute]">
where . indicates typographical superimposition,
not functional composition :-)
which leads to many more entity definitions.
Note: I have used a-acute, which can be represented in two ways in Unicode
and ISO 10646, as there is a pre-combined aacute character for compatibility
with ISO 8859-1 (Latin 1). Pretend that there was no such combined
character -- e.g. if I had to produce r-acute, and needed to use the
combining characters -- as that is the more general case.
If you have to handle a character sequence that maps to a single glyph,
how much harder is it to allow an arbitrary sequence?
It might be worth saying whether entity references can occur within
entity values; they could be banned in XML to simplify implementation in
environments that lack recursion, such as FORTRAN 66.
(are there any FORTRAN 66 SGML-aware Unicode implementations???)
Lee
Received on Thursday, 10 October 1996 01:28:23 UTC