- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 10 Sep 2012 19:37:59 +0300
- To: "www-validator@w3.org" <www-validator@w3.org>
- CC: Olivier MARTINET <oolnet@free.fr>
2012-09-10 18:24, David Dorward wrote: > On 9 Sep 2012, at 13:25, Olivier MARTINET <oolnet@free.fr> wrote: >> Well, Could you add the charset="ISO-10646" in your validation web site. > > As far as I can tell (from 5 minutes with Wikipedia), ISO-10646 isn't > a character encoding but is, like Unicode, a foundation upon which encodings can be built. Wikipedia is usually useless or worse in complicated issues like character encodings. ISO 10646 is a character code standard, which is code-point compatible with Unicode. A more formal side of Unicode, so to say. But in addition to this, ISO-10646 _is_ a character encoding, defined in the authoritative registry http://www.iana.org/assignments/character-sets It defines ISO-10646 as an alias for ISO-10646-Unicode-Latin1, which is described as follows: "Source: ISO Latin-1 subset of Unicode. Basic Latin and Latin-1 Supplement = collections 1 and 2. See ISO 10646, Appendix A. See RFC 1815." And the RFC http://www.ietf.org/rfc/rfc1815.txt defines it as covering only Basic Latin and Latin 1 Supplement (i.e., the same characters as ISO-8859-1), encoded in "16 bit big endian form", and adds: "For practical communication, use of "ISO-10646" is discouraged." Since this means that it's just UTF-16BE limited to the first two blocks of Unicode, it's something you should never use on the web. Technically, charset="ISO-10646" is correct. But implementations are not required to support it, and they mostly don't. It would probably be a disservice to add support to it to the markup validator, as people would be misled into thinking that it's OK to use it. Pages in French should normally use UTF-8. Using ISO-8859-1 is possible, too, but then you cannot enter even all French letters (like the oe letter) directly. Yucca
Received on Monday, 10 September 2012 16:38:32 UTC