Re: For review: Migrating to Unicode

John Cowan wrote:

 [C0 vs. C1 in iso-8859-1] 
> In theory, perhaps; in practice, no.  The C0 set of ISO 646,
> or parts of it, are by default in effect; no C1 set is.

Okay, I know that I can use CRLF in iso-8859-1 among others in
practice, but I'd expect at least a hint about this practical
default also in the standard.  Trying to implement this with
an explicit ESC ! @ likely won't work as expected in practice.  

On the page <http://www.itscj.ipsj.or.jp/ISO-IR/2-5.htm> four
different C0 sets claim to be related to ISO 646.

> Unicode is indifferent to which Cx sets are used with it.
> The names of the characters in normal sets are carried in
> UnicodeData.txt for convenience, but they aren't normative
> in Unicode.

The book says that I may assume ECMA 48 (ISO 6429), and in
table 16.1 it claims that 10 control codes are "specified".
I don't know what this means, it's followed by a discussion
of u+0000 not belonging the ten "specified" control codes,
but in any case NEL u+0085 is "specified" (= one of the ten).

> filling out the block with ^Zs was just an application
> convention -- no more than one was ever needed.  In OS/8,
> the same convention was used for object code files as well
> as text.

I fear I missed OS/8, the oldest platforms I recall are /360,
TOPS/10, BS2000, and TR 440.  For the use of 0xF0 by format
tools I guess it is an urban legend that it is derived from
EBCDIC "V" = "virgin".

> ^W (logical end of medium) would have been the Right Thing.

For some uses of ^A .. ^Z such as Martin's example ^S they
could be mnemonics, S = suspend (XOFF, therefore Q = XON),
Z = last letter (therefore eof), R = reprint.

One year after <http://www.w3.org/People/cmsmcq/2007/C1.xml>
all this appears to be still as messy as twenty years ago :-(
But in RFC 20 almost 40 years ago it was still fine.

 Frank

Received on Monday, 24 March 2008 12:07:45 UTC