Re: http charset labelling

Gavin Nicol (gtn@ebt.com)
Tue, 6 Feb 1996 10:04:55 -0500


Date: Tue, 6 Feb 1996 10:04:55 -0500
From: Gavin Nicol <gtn@ebt.com>
Message-Id: <199602061504.KAA13675@ebt-inc.ebt.com>
To: mohta@necom830.cc.titech.ac.jp
Cc: masinter@parc.xerox.com, keld@dkuug.dk, uri@bunyip.com
In-Reply-To: <199602060213.LAA15162@necom830.cc.titech.ac.jp> (message from Masataka Ohta on Tue, 6 Feb 96 11:12:57 JST)
Subject: Re: http charset labelling

>> Or fix the problem by allowing specification of the encoding used for
>> the URL's.
> 
>That's no fix.
> 
>If you allow specification of the encoding, what we can see on paper
>is resulting lengthy specification of the encoding concatenated with
>lengthy 7bit encoding of the URL body.

Don't be silly. On paper, people will be looking at glyphs, and
thereby associat them with characters (one way of decoding
information). On the Internet, computers will be looking at a set of
octets, and mapping them to characters by using some information about
the encoding used for the characters. The end result is the same (a
mapping to characters), but the process is entirely different, and
rightly so.

The point is simply this: if I give a business card to someone, and it
has a URL pointing to something with kanji in it, then if that person
goes to a SJIS systems and types in the URL, the server needs to know
how to map that set of octets to a resource. The results might
vary widely depending on whether the data was transmitted as SJIS,
EUC or UTF-8, if there is no encoding information.

I agree that such URL's are not very useful in an international 
setting, but that does not mean they should be dissallowed
entirely. That is like saying that Japanese should only use romanji.
 
>> Yes, there is an Internet directory put out by Gakken (I forget the
>> name) that had such an article last month.
> 
>Then, Gakken should be wrong. Or, you may be confusing URL and
>text content.

I most certainly did not confuse a URL with content (very difficult to
do, especially as I can read Japanese).

I guess you, I, and a lot of other people, think that if people really
want to be global, they should avoid using kanji, or whatever, in
URL's. However, as a persoan at Astec said, and I agree, people *will*
put kanji into resource names, and they *will* expect it to work. As
such, I think it better to design a system that can handle *all*
cases, as users expect them to be handled.