W3C home > Mailing lists > Public > www-international@w3.org > January to March 2008

Re: ISO 8859-1 C1 set in RFC 2157 (was: For review: Migrating to Unicode)

From: Uma Umamaheswaran <umavs@ca.ibm.com>
Date: Tue, 25 Mar 2008 10:47:18 -0400
To: www-international@w3.org
Message-ID: <OFE6136BC3.D867BB65-ON85257417.004FC484-85257417.00513C46@ca.ibm.com>

Another perspective one has to consider is ...

When the original series of 8859 were being formulated in ECMA as 8-bit
coded character sets, there was also an accompanying ECMA 43 standard ..
(http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-043.pdf)
with different levels in it.  ECMA 43 became ISO 4873.  Different levels
are specified there.  Of which Level 1 was the structure to be used
primarily for the pure 8-bit 8859 series with no code extensions etc.
While all the code extension techniques were permitted in levels 2 and 3,
level 1 was constrained.
ADA used it ... see
http://lgl.epfl.ch/ada/components/text_processing/implementation.html
RFC 1502 on X.400 has a usage of it  .. see
http://www.faqs.org/rfcs/rfc1502.html
Figure 2 in http://www.columbia.edu/kermit/ftp/e/isok7.txt has a view of
the C sets

The defaults of C0 and C1 sets from ISO 6429 / ECMA 48 are used here ..
without any code extensions permited in Level 1 of ISO 4873.

Not sure, but I suspect in practice when one tags the email, HTML etc. with
ISO 8859-1 charset, the intent is to use the pure 8-bit 8859-1 without code
extensions and C0, C1 as defaults from 6429 similar to what can be seen in
the above cited examples.






V.S. UMAmaheswaran, Ph.D.
Globalization Centre of Competency, IBM Toronto Lab
A2/SZ8, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474;
Fax:905 413 4682; TieLine 313-3474; email: umavs@ca.ibm.com
Received on Tuesday, 25 March 2008 14:47:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:16 GMT