Re: Encodings on validator.w3.org/detailed.html from Martin Duerst on 2002-12-06 (www-validator@w3.org from December 2002)

From: Martin Duerst <duerst@w3.org>
Date: Sat, 07 Dec 2002 02:29:32 +0900
To: Terje Bless <link@pobox.com>, W3C Validator <www-validator@w3.org>
Cc: Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de>
Message-Id: <4.2.0.58.J.20021207012528.068addd0@localhost>

At 07:06 02/12/06 +0100, Terje Bless wrote:
>Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de> wrote:
>
> >I refer to <http://validator.w3.org/detailed.html> where I find
> >under "encodings":
> >
> >> iso-8859-4 (Baltic Rim)
> >
> >Should read "iso-8859-4 (North European)".

I have checked this a bit. The evidence for a change
is certainly less than conclusive. For example
http://www.itscj.ipsj.or.jp/ISO-IR/110.pdf lists:
    Danish, English, Estonian, Finnish, German, Greenlandic,
    Lappish, Latvian, Lithuanian, and Norvegian.

http://www.terena.nl/library/multiling/ml-docs/iso-8859.html says:
    ISO-8859-4 - Latin 4
    Scandinavia/Baltic (mostly covered by 8859-1 also): Estonian, Latvian, 
and
    Lithuanian. It is an incomplete predecessor of Latin 6.
and
    ISO-8859-10 - Latin 6
    Latin6, for Lappish/Nordic/Eskimo languages: Adds the last Inuit
    (Greenlandic) and Sami (Lappish) letters that were missing in Latin 4
    to cover the entire Nordic area.

Andreas, please point to any supporting evidence
for why we should make your proposed change.

> >> iso-8859-13 (Latin 7)"
> >
> >Should read "iso-8859-13 (Baltic Rim)".

There is not too much info available, but it seems
that this change is appropriate.

> >> iso-8859-6-i (Arabic)
> >
> >Should read "iso-8859-6 (Arabic)"

Yes, I guess this should be changed. Please see the very end of
http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.2.4.

> >You might want to add the following encodings:
> >  iso-8859-8 (Hebrew)

Because of the bidi support in HTML, the correct charset
for Hebrew is iso-8859-8-i. See also at the end of
http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.2.4.
For XML, this is a little bit less clearcut, but I think
it still applies.

> >  iso-8859-11 (Thai)

This one is not registered with IANA
(see http://www.iana.org/assignments/character-sets).
It has to be registered before we will use it.

> >  iso-ir-111 (Cyrillic KOI-8)

This one seems to be okay.

>Thanks for the feedback Andreas.
>
>I'm not an expert on encoding issues so perhaps Martin would comment on
>this? I've logged this as Bug #106 in our bug tracking system (see
><http://www.w3.org/Bugs/Public/show_bug.cgi?id=106>).

Regards,    Martin.

Received on Friday, 6 December 2002 14:17:19 UTC