Re: Comments on 31 March spec
Peter Flynn wrote:
> At 09:24 04/04/97 -0800, Tim Bray wrote:
> >in lots of different ways. But the characters must all be unicode-defined
> >characters. A character reference 瘾 is a number, and that number
> >is *always* a unicode/10646 number.
> Did we get rid of the &#u-HHHH; references? What happened to &#DDDD;
> (or did I miss it)?
There never was a &#u-HHHH form, as far as I know.
There was suggested
* entity reference (from SPREAD public entity set) e.g. &U-HHHH;
* hex numeric character reference (from Gavin's suggestion) e.g.
Because XML uses ISO 10646 (regardless of the transmission character
set or encodings used on the route from server to browser), there
is no need to use an entity reference system: it would only duplicate
the numeric character references.
(But, a document that uses numeric character references for everything
above U+00FF, and only uses ISO 8859-1 characters for
markup, can still be processed on an 8-bit SGML system by preprocessing
the hex numeric character references into the SPREAD entity references,
e.g. sed "s/\&\#X/\&U\-/g" infile outfile
A different approach is to make the hex numeric character reference
start delimiter into "&U-