ISO-10646 (Was: Re: Add Subject Here)

2012-09-10 18:24, David Dorward wrote:

> On 9 Sep 2012, at 13:25, Olivier MARTINET <oolnet@free.fr> wrote:
>> Well, Could you add the  charset="ISO-10646" in your validation web site.
>
> As far as I can tell (from 5 minutes with Wikipedia), ISO-10646 isn't
> a character encoding but is, like Unicode, a foundation upon which encodings can be built.

Wikipedia is usually useless or worse in complicated issues like 
character encodings.

ISO 10646 is a character code standard, which is code-point compatible 
with Unicode. A more formal side of Unicode, so to say.

But in addition to this, ISO-10646 _is_ a character encoding, defined in 
the authoritative registry
http://www.iana.org/assignments/character-sets
It defines ISO-10646 as an alias for ISO-10646-Unicode-Latin1, which is 
described as follows:
"Source: ISO Latin-1 subset of Unicode. Basic Latin and Latin-1
          Supplement  = collections 1 and 2.  See ISO 10646,
          Appendix A.  See RFC 1815."
And the RFC http://www.ietf.org/rfc/rfc1815.txt defines it as covering 
only Basic Latin and Latin 1 Supplement (i.e., the same characters as 
ISO-8859-1), encoded in "16 bit big endian form", and adds: "For 
practical communication, use of "ISO-10646" is discouraged."

Since this means that it's just UTF-16BE limited to the first two blocks 
of Unicode, it's something you should never use on the web.

Technically, charset="ISO-10646" is correct. But implementations are not 
required to support it, and they mostly don't. It would probably be a 
disservice to add support to it to the markup validator, as people would 
be misled into thinking that it's OK to use it.

Pages in French should normally use UTF-8. Using ISO-8859-1 is possible, 
too, but then you cannot enter even all French letters (like the oe 
letter) directly.

Yucca

Received on Monday, 10 September 2012 16:38:32 UTC