Re: New article published: Introducing Character Sets and Encodings from Frank Ellermann on 2006-01-23 (www-international@w3.org from January to March 2006)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Mon, 23 Jan 2006 16:56:56 +0100
To: www-international@w3.org
Message-ID: <43D4FCC8.327C@xyzzy.claranet.de>

Richard Ishida wrote:

>         Introducing Character Sets and Encodings

The link in section "what is it" goes to a slide 3 (or 30 ?) of
a slide show with almost no text.  The PNG has text, is that
maybe a bug in the software creating this slide show ?  E.g.
slide 4 is okay.

The next link discusses "choose an encoding" with the usual
recommendation of UTF-8, but doesn't mention windows-1252.  It
is still one of the FAQs on the validator list how to get the
latter right, the famous "hex. 80 is no Euro in Latin-1, but
it's in windows-1252, and no, &#x80; / &#128; is never an Euro".

Discussing that FAQ in a text for beginners could explain some
of the concepts.  Of course there are already tons of Euro-FAQs
published, but many of them have no "neutral POV" just stating
the facts.

The point is addressed in the linked article about NCRs.  But
maybe it's a bit difficult to put the facts together when they
are distributed over different articles:  Picking windows-1252
to get a backwards compatible Euro hex. 80 is an option, BUT
[insert stuff why UTF-8 is probably better].

BTW, I like the revised NCR-article.  The charset-article says
that it contains examples in another language/script, but at
the moment it's apparently all "en", and excluding one u+00A9
and one u+00AE in the legalese it's even US ASCII.

Maybe an I18N joke for the legal department, if they want more
backwards compatibility for &copy; and &reg; they should use
the symbolic character entities (or Latin-1 instead of UTF-8).

                       Bye, Frank

Received on Monday, 23 January 2006 16:05:16 UTC