- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Tue, 13 Feb 1996 00:35:21 +0100
- To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, gtn@ebt.com (Gavin Nicol)
- Cc: masinter@parc.xerox.com, uri@bunyip.com
Masataka Ohta writes: > > The results might > > vary widely depending on whether the data was transmitted as SJIS, > > EUC or UTF-8, if there is no encoding information. > > Because of duplicated shape of 'A' for Latin and Greek capital > letter 'A' and alpha, and because of duplicated encoding of Big5, > encoding information, in general, is no fix for unique conversion > from shape on a paper to internal code. > > Don't try to do something proven to be impossible. Well, Otha, there are a number of ways to do it, for example considering all of greek capital letter alfa, latin capital letter A and the cyrillic letter A as equivalent for matching, and similar equivalence specs may be available for other characters. Also narrow and full width letters may be equivalenced. Anyway it should be clear from the context which version the "A" is - if it is together with greek characters it is most likely an Alfa, if with latin characters it is most likely a latin letter etc. It is up to the maker of the URL to ensure that the intended audience will get the message, and some careful choice may be done there. keld
Received on Monday, 12 February 1996 18:36:17 UTC