[whatwg] ISO-8859-* and the C1 control range from Øistein E. Andersen on 2007-06-05 (public-whatwg-archive@w3.org from June 2007)

From: Øistein E. Andersen <html5@xn--istein-9xa.com>
Date: Tue, 05 Jun 2007 16:11:35 +0200
Message-ID: <E1HvZl1-0009TX-Qu@node1-7.ouvaton.local>

On Jun 5, 2007, at 11:38, Kristof Zelechovski wrote:
> And why not:?
>     2c) If the declared encoding was ISO-8859-2, replace that
> character with the [correponding] character [... from] Windows-1250.

On Jun 5, 2007, at 11:51, Henri Sivonen wrote:
> that's not what [browsers] do, so apparently it is not
> required for compatibility

A more fundamental reason is that the two encodings are
incompatible. Amongst the nine 9 Windows-125* encodings,
8 have ISO-8859-* counterparts, of which 4 are subsets
of the corresponding Windows-125* encoding:

Windows-1250 vs. ISO-8859-2 (Eastern European):
    The range 0xC0--0xFF is the same in both encodings,
    but 0xA0--0xBF, which does include letters, is different.

Windows-1251 vs. ISO-8859-5 (Cyrillic):
    Completely incompatible. Most notably, Cyrillic letters
    from the modern Russian alphabet (32 uppercase and 32
    lowercase) are shifted by 0x10.

Windows-1252 vs. ISO-8859-1 (Western European):
    Superset.

Windows-1253 vs. ISO-8859-7 (Greek):
    Almost compatible. Unfortunately, a few bytes in the
    range 0xA0--0xBF are assigned to different characters,
    and the accented capital Alpha is positioned differently.

Windows-1254 vs. ISO-8859-9 (Turkish):
    Superset.

Windows-1255 vs. ISO-8859-8 (Hebrew):
    Superset.

Windows-1256 vs. ISO-8859-6 (Arabic):
    Arabic consonants seem to have the same code points,
    but vowels have incompatible positions. Windows-1256 contains
    lowercase French accented characters and even the oe 
    ligature, whereas ISO-8859 leaves many bytes undefined.

Windows-1257 vs. ISO-8859-13 (Baltic):
    Superset.

Windows-1258 (Vietnamese):
    No corresponding ISO-8859-* encoding.

-- 
?istein E. Andersen

Received on Tuesday, 5 June 2007 07:11:35 UTC