Re: Comments on 31 March spec from Rick Jelliffe on 1997-04-07 (w3c-sgml-wg@w3.org from April 1997)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Mon, 07 Apr 1997 15:37:49 +1000
To: Peter Flynn <pflynn@curia.ucc.ie>
CC: w3c-sgml-wg@w3.org
Message-ID: <3348882D.3894@allette.com.au>

Peter Flynn wrote:
> 
> At 09:24 04/04/97 -0800, Tim Bray wrote:
> [...]
> >in lots of different ways.  But the characters must all be unicode-defined
> >characters.  A character reference &#X763e; is a number, and that number
> >is *always* a unicode/10646 number.
> 
> Did we get rid of the &#u-HHHH; references? What happened to &#DDDD;
> (or did I miss it)?
> 
> ///Peter

There never was a &#u-HHHH form, as far as I know.

There was suggested

* entity reference (from SPREAD public entity set) e.g. &U-HHHH;
* hex numeric character reference (from Gavin's suggestion) e.g.
&#xHHHH;

Because XML uses ISO 10646 (regardless of the transmission character
set or encodings used on the route from server to browser), there 
is no need to use an entity reference system: it would only duplicate 
the numeric character references.

(But, a document that uses numeric character references for everything
above U+00FF, and only uses ISO 8859-1 characters for
markup, can still be processed on an 8-bit SGML system by preprocessing
the hex numeric character references into the SPREAD entity references,
e.g. sed "s/\&\#X/\&U\-/g" infile outfile
I suppose.)

A different approach is to make the hex numeric character reference
start delimiter into "&U-

Rick Jelliffe

Received on Tuesday, 8 April 1997 13:43:31 UTC