- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 10 Sep 2012 19:37:59 +0300
- To: "www-validator@w3.org" <www-validator@w3.org>
- CC: Olivier MARTINET <oolnet@free.fr>
2012-09-10 18:24, David Dorward wrote:
> On 9 Sep 2012, at 13:25, Olivier MARTINET <oolnet@free.fr> wrote:
>> Well, Could you add the charset="ISO-10646" in your validation web site.
>
> As far as I can tell (from 5 minutes with Wikipedia), ISO-10646 isn't
> a character encoding but is, like Unicode, a foundation upon which encodings can be built.
Wikipedia is usually useless or worse in complicated issues like
character encodings.
ISO 10646 is a character code standard, which is code-point compatible
with Unicode. A more formal side of Unicode, so to say.
But in addition to this, ISO-10646 _is_ a character encoding, defined in
the authoritative registry
http://www.iana.org/assignments/character-sets
It defines ISO-10646 as an alias for ISO-10646-Unicode-Latin1, which is
described as follows:
"Source: ISO Latin-1 subset of Unicode. Basic Latin and Latin-1
Supplement = collections 1 and 2. See ISO 10646,
Appendix A. See RFC 1815."
And the RFC http://www.ietf.org/rfc/rfc1815.txt defines it as covering
only Basic Latin and Latin 1 Supplement (i.e., the same characters as
ISO-8859-1), encoded in "16 bit big endian form", and adds: "For
practical communication, use of "ISO-10646" is discouraged."
Since this means that it's just UTF-16BE limited to the first two blocks
of Unicode, it's something you should never use on the web.
Technically, charset="ISO-10646" is correct. But implementations are not
required to support it, and they mostly don't. It would probably be a
disservice to add support to it to the markup validator, as people would
be misled into thinking that it's OK to use it.
Pages in French should normally use UTF-8. Using ISO-8859-1 is possible,
too, but then you cannot enter even all French letters (like the oe
letter) directly.
Yucca
Received on Monday, 10 September 2012 16:38:32 UTC