- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 25 Mar 2008 17:20:32 +0100
- To: www-international@w3.org
Uma Umamaheswaran wrote: > Of which Level 1 was the structure to be used primarily for > the pure 8-bit 8859 series with no code extensions etc. Yes. Apparently ECMA 94 doesn't clearly say this, maybe this was fixed later in ISO 8859. It would remove all weird ideas about using any G2 / G3 / SS2 / SS3 / ... "within" ISO 8859, and of course in practice nobody does this. > http://lgl.epfl.ch/ada/components/text_processing/implementation.html That confirms most of what was discussed here, still using IND 0x84, removed in a later ISO 6429 (ECMA 48) version. > http://www.faqs.org/rfcs/rfc1502.html That's a "historic" RFC, I sent Harald a list with the ESC sequences that were not yet clear when he wrote this RFC. The fastest way to create new Unicode evangelists, let them figure out ISO 2022 or 4873 ;-) > http://www.columbia.edu/kermit/ftp/e/isok7.txt Ouch, more about ISO 2022 than I ever wanted to know. > I suspect in practice when one tags the email, HTML etc. > with ISO 8859-1 charset, the intent is to use the pure > 8-bit 8859-1 without code extensions and C0, C1 as > defaults from 6429 similar to what can be seen in the > above cited examples. Yes, notably no 0E + 0F (SI + SO or similar), no 8E + 8F (SS2 + SS3), and no 1B 4E + 1B 4F (ditto 7bit) magic. No other 1B oddities, maybe excluding 1B 5B (7bit CSI). Arguably no 85 (NEL), 9B (CSI), or actually no 80..9F at all reserving 8E + 8F for ISO 4873 level 2 without using it at level 1. None of the C1 controls is essential, if all else fails they can be emulated with 7bit. That would support John's argument that windows-1252 is an extension of ISO 8859-1, in practice it is, no matter what the ISO theory about graphical characters said. Frank
Received on Tuesday, 25 March 2008 16:18:53 UTC