Re: ISO 8859-1 C1 set in RFC 2157

Uma Umamaheswaran wrote:
 
> Of which Level 1 was the structure to be used primarily for
> the pure 8-bit 8859 series with no code extensions etc.

Yes.  Apparently ECMA 94 doesn't clearly say this, maybe this
was fixed later in ISO 8859.  It would remove all weird ideas
about using any G2 / G3 / SS2 / SS3 / ... "within" ISO 8859,
and of course in practice nobody does this.  

> http://lgl.epfl.ch/ada/components/text_processing/implementation.html

That confirms most of what was discussed here, still using 
IND 0x84, removed in a later ISO 6429 (ECMA 48) version.

> http://www.faqs.org/rfcs/rfc1502.html

That's a "historic" RFC, I sent Harald a list with the ESC
sequences that were not yet clear when he wrote this RFC.

The fastest way to create new Unicode evangelists, let them
figure out ISO 2022 or 4873 ;-)  

> http://www.columbia.edu/kermit/ftp/e/isok7.txt

Ouch, more about ISO 2022 than I ever wanted to know.

> I suspect in practice when one tags the email, HTML etc.
> with ISO 8859-1 charset, the intent is to use the pure
> 8-bit 8859-1 without code extensions and C0, C1 as 
> defaults from 6429 similar to what can be seen in the
> above cited examples.

Yes, notably no 0E + 0F (SI + SO or similar), no 8E + 8F
(SS2 + SS3), and no 1B 4E + 1B 4F (ditto 7bit) magic.
No other 1B oddities, maybe excluding 1B 5B (7bit CSI).

Arguably no 85 (NEL), 9B (CSI), or actually no 80..9F at
all reserving 8E + 8F for ISO 4873 level 2 without using
it at level 1.  None of the C1 controls is essential, if
all else fails they can be emulated with 7bit.  

That would support John's argument that windows-1252 is
an extension of ISO 8859-1, in practice it is, no matter
what the ISO theory about graphical characters said.

 Frank

Received on Tuesday, 25 March 2008 16:18:53 UTC