Re: A8 and A17: entities, conditional inclusion, what's XML for? from lee@sq.com on 1996-10-10 (w3c-sgml-wg@w3.org from October 1996)

From: <lee@sq.com>
Date: Thu, 10 Oct 96 01:28:07 EDT
To: Charles@sgmlsource.com, U35395@UICVM.CC.UIC.EDU
Cc: w3c-sgml-wg@w3.org
Message-Id: <9610100528.AA21854@sqrex.sq.com>

> As for entities, James Clark's suggestion that internal entities be
> limited to a
> single character is a good one, as it provides a mnemonic capability for
> &#Unnnnn.

Do you mean a single chracter or a single glyph?
For example, is the sequence
    the latin letter small a
    combining accent acute
one or two characters, or is is 8 characters (2 4-octet bit combinations)?

Note that ISO 10646 relies on the use of combining characters to
produce accented characters, except for a few special cases that are
in ther for historical reasons.

If you allow only a single codepoint, can I do

<!Entity acute SDATA "[combining accent acute]">

and then use a&acute; to get an a-acute?  This practice is quite common
in SGML today, of course, as is the alternative
<!Entity a-acute SDATA "[the latin letter small a] . [combining accent acute]">
		where . indicates typographical superimposition,
		not functional composition :-)
which leads to many more entity definitions.

Note: I have used a-acute, which can be represented in two ways in Unicode
and ISO 10646, as there is a pre-combined aacute character for compatibility
with ISO 8859-1 (Latin 1).  Pretend that there was no such combined
character -- e.g. if I had to produce r-acute, and needed to use the
combining characters -- as that is the more general case.

If you have to handle a character sequence that maps to a single glyph,
how much harder is it to allow an arbitrary sequence?

It might be worth saying whether entity references can occur within
entity values; they could be banned in XML to simplify implementation in
environments that lack recursion, such as FORTRAN 66.
(are there any FORTRAN 66 SGML-aware Unicode implementations???)

Lee

Received on Thursday, 10 October 1996 01:28:23 UTC