Re: http charset labelling

Keld J|rn Simonsen (keld@dkuug.dk)
Tue, 13 Feb 1996 00:35:21 +0100


Message-Id: <199602122335.AAA22028@dkuug.dk>
From: keld@dkuug.dk (Keld J|rn Simonsen)
Date: Tue, 13 Feb 1996 00:35:21 +0100
In-Reply-To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, gtn@ebt.com (Gavin Nicol)
Subject: Re: http charset labelling
Cc: masinter@parc.xerox.com, uri@bunyip.com

Masataka Ohta writes:

> > The results might
> > vary widely depending on whether the data was transmitted as SJIS,
> > EUC or UTF-8, if there is no encoding information.
> 
> Because of duplicated shape of 'A' for Latin and Greek capital
> letter 'A' and alpha, and because of duplicated encoding of Big5,
> encoding information, in general, is no fix for unique conversion
> from shape on a paper to internal code.
> 
> Don't try to do something proven to be impossible.

Well, Otha, there are a number of ways to do it, for example 
considering all of greek capital letter alfa, latin capital letter
A and the cyrillic letter A as equivalent for matching, and similar
equivalence specs may be available for other characters.
Also narrow and full width letters may be equivalenced.

Anyway it should be clear from the context which version
the "A" is - if it is together with greek characters it is
most likely an Alfa, if with latin characters it is most likely
a latin letter etc. It is up to the maker of the URL to ensure that
the intended audience will get the message, and some careful choice
may be done there.

keld