Re: http charset labelling

Larry Masinter (masinter@parc.xerox.com)
Thu, 1 Feb 1996 11:57:58 PST


To: keld@dkuug.dk
Cc: uri@bunyip.com
In-Reply-To: Keld J|rn Simonsen's message of Thu, 1 Feb 1996 11:24:11 -0800 <199602011924.UAA26497@dkuug.dk>
Subject: Re: http charset labelling
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <96Feb1.115759pst.2733@golden.parc.xerox.com>
Date: Thu, 1 Feb 1996 11:57:58 PST

No, I think that if people write URLs on business cards, they should
write them in a way that the recipient of the business card can type
in. So if I visit Japan and someone gives me a business card with
their URL on it, and I come home and type this into my system, I
should be able to type it in based on the capabilities of *my* system,
and not the system of the person who gave me the card.

My system only supports ISO-8859-1. Do you want to require me to have
a Japanese typing system before I can type in this person's URL? (I'd
like one, arigato gozaimasu, but it isn't the standard configuration
in my office.) I suppose you think that typing in their URL won't be
useful to me if my system doesn't display Kanji, but being able to
display it is easier than being able to type it.

Maybe they'll need or want to print two URLs on their business card,
the Kanji version for those folks who can type in Kanji directly, and
the ascii-encoded version for those people like me who have to deal
with a Kanji-impoverished system like this dumb workstation I'm using
now.

> I am not sure what you are saying here: do you mean that the user
> should know what charset the URL is encoded in at the server?

I'm saying that URLs must clearly show in all of their presentations
the charset used to encode any non-ASCII data. Otherwise, the Big5
user might type in Big5 URLs to a shift-Jis server.

I'm suggesting you could use a similar mechanism to that in Section 2
of RFC 1522. Not the *same* mechanism,

(a) because URLs need to appear in the same context as RFC 1522
outlines and the =? designation would confuse any interpreter of
RFC1522
(b) because ? has a reserved significance in URLs and the requirement
to encode them would be onerous
(c) because URLs already have a designated and different 'encoding',
the RFC1522 encoding is not necessary.


That is, you could define a charset-enabled HTTP URL as:

     http://host.dom/charset/encoded-text

and designate that some web servers might be 'charset-enabled'.
This doesn't change the HTTP protocol or the URL syntax.

People who wanted to use anything other than ISO-8859-1 in their file
names on their web servers could write URLs as

   http://host.dom.jp/shift-jis/%1B$B!X%1B

or whatever (apologies for bad shift-jis) on their business cards.