Re: char set encodings from Martin J Duerst on 1996-05-24 (www-font@w3.org from April to June 1996)

From: Martin J Duerst <mduerst@ifi.unizh.ch>
Date: Fri, 24 May 1996 14:20:18 +0200 (MET DST)
To: boo@best.com (Walter Ian Kaye)
Cc: www-font@w3.org
Message-Id: <"josef.ifi..265:24.04.96.12.20.22"@ifi.unizh.ch>
Walter Ian Kaye wrote:

>At 4:56p  -0400 05/23/96, Rob Migliore wrote:
>
>>We provide data in several languages such as english, polish, and
>russian 
>>and install the necessary fonts on our clients' systems (running ns 
>>2.02).  We would like to encode our documents of another language in a
>>way that changes the font or the character set automatically without 
>>having to switch to options, etc.  Actually, I believe that we would
>want to change the character set.  Can anyone comment on this?  

Definitely the "charset", which is a well used but very inappropriate
name in MIME to denote encodings of characters. Please don't use
fonts to switch between different encodings, it might work in some
cases, but will give you big headaches in the future.


>>Some of these languages that we would be supporting are not supported
>>directly by ns2.02, they would have to be user defined.  Is it still 
>>possible to automate the switching of the fonts/character sets?
>
>>I've seen the META tag around as follows - does it work?
>
>>  <META HTTP-EQUIV="Content-Type" CONTENT="Text/Html;
>    Charset=iso2022-jp">

Yes, this is correct, and it works on Netscape (and a few other
not so well known browsers). Ideally, this information is sent
in the HTTP header, and not as part of the document, because
there are "charset"s where it is impossible to understand that
tag inside the document.


>First off, there is a difference between "language" and "character set".
>
>There are charsets *associated* with languages, but that's as far as it
>goes.
>
>
>For example, Netscape supports "charset=iso-8859-2" and
>"charset=x-mac-ce" for Central European languages. Note
>that this covers more than one language.

It may be nice help to the user that Netscape supports
x-mac-ce, but please help everywhere to do what fortunately
has worked for Western Europe, namely that only one single
encoding (i.e. "charset") is used. Different platforms currently
still use different local encodings, but for the web, it does
not help to just use your local "charset", or the one you
think might be most popular on target machines. The only
good solution is to stick to very few widely usable sets.
For Central Europe, this clearly would be iso-8859-2, as
it is iso-8859-1 for Western Europe (you don't have to
indicate this, as it is the default).
Ideally, in the future, there will be even less "charset"s,
if everybody is moving towards Unicode.


>You have to determine whether the characters used in Polish are covered
>by one of the supported charsets.

They should be supported by both of them. But please use
iso-8859-2 for wide compatibility.

>Unfortunately, Netscape only supports the character sets it has
>defined:
>
>"us-ascii",
>"iso-8859-1", "x-mac-roman", "iso-8859-2", "x-mac-ce", 
>              "iso-2022-jp","x-sjis", "x-euc-jp", 
>              "euc-kr", "iso-2022-kr", 
>              "gb2312", "gb_2312-80" 
>              "x-euc-tw", "x-cns11643-1", "x-cns11643-2", "big5"
>
>There's no way to use any others and have Netscape take advantage of
>it. So even if you found the name of a Russian character set (such as
>iso-8859-5,
>which I found in rfc1345), it wouldn't do you any good because Netscape
>would just ignore it.

The number of "charset"s that Netscape is supporting is increasing
with every version, maybe more than necessary. But definitely, they
should add iso-8859-5, and if you can convince them that you have
a wide enough market, I guess they might even make a special version
for you. While it is very difficult for an outsider to add an encoding,
it is rather easy for them, especially if the underlying OS already
supports it, which I assume is the case for you users.

<<--wants to know the charset names for MS Windows and Unix...

For the web, using "charset"s proprietary to some system is the
wrong thing to do. Please use only widely accepted and standardized
"charset"s, which means mainly the iso-8859-x series and some
others.

Regards,	Martin.
Received on Friday, 24 May 1996 08:21:51 UTC