Re: http charset labelling from Keld J|rn Simonsen on 1996-02-12 (uri@w3.org from February 1996)

From: Keld J|rn Simonsen <keld@dkuug.dk>
Date: Tue, 13 Feb 1996 00:35:21 +0100
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, gtn@ebt.com (Gavin Nicol)
Cc: masinter@parc.xerox.com, uri@bunyip.com
Message-Id: <199602122335.AAA22028@dkuug.dk>

Masataka Ohta writes:

> > The results might
> > vary widely depending on whether the data was transmitted as SJIS,
> > EUC or UTF-8, if there is no encoding information.
> 
> Because of duplicated shape of 'A' for Latin and Greek capital
> letter 'A' and alpha, and because of duplicated encoding of Big5,
> encoding information, in general, is no fix for unique conversion
> from shape on a paper to internal code.
> 
> Don't try to do something proven to be impossible.

Well, Otha, there are a number of ways to do it, for example 
considering all of greek capital letter alfa, latin capital letter
A and the cyrillic letter A as equivalent for matching, and similar
equivalence specs may be available for other characters.
Also narrow and full width letters may be equivalenced.

Anyway it should be clear from the context which version
the "A" is - if it is together with greek characters it is
most likely an Alfa, if with latin characters it is most likely
a latin letter etc. It is up to the maker of the URL to ensure that
the intended audience will get the message, and some careful choice
may be done there.

keld

Received on Monday, 12 February 1996 18:36:17 UTC