Re: http charset labelling

Keld J|rn Simonsen (keld@dkuug.dk)
Fri, 2 Feb 1996 03:06:48 +0100


Message-Id: <199602020206.DAA06816@dkuug.dk>
From: keld@dkuug.dk (Keld J|rn Simonsen)
Date: Fri, 2 Feb 1996 03:06:48 +0100
In-Reply-To: Larry Masinter <masinter@parc.xerox.com>
To: Larry Masinter <masinter@parc.xerox.com>
Subject: Re: http charset labelling
Cc: uri@bunyip.com

Larry Masinter writes:

> No, I think that if people write URLs on business cards, they should
> write them in a way that the recipient of the business card can type
> in. So if I visit Japan and someone gives me a business card with
> their URL on it, and I come home and type this into my system, I
> should be able to type it in based on the capabilities of *my* system,
> and not the system of the person who gave me the card.

I see your point, but either we allow people to write all characters
in the world, or we just stick to what we got: ASCII.
I think the main point of this discussion was to go beyond ASCII.

I call it the needs of the "euipment disabled" - and I am also in
that category myself, I cannot read or type or understand say
Chinese. And most humans would have such shortcomings on understanding
some other script and language.

I think our main concern here is to allow people that naturally uses
these "weird" characters to not be restricted in their abilities,
with a succedent impedement on their culture - which is going to
be more and more networked... because some "equipment disabeled"
say "I can't do it".

The normal solution for "disabled" is to design some device that
can make them also operate in this society, although not as
efficient and elegant as the fully equipped.

I did specify a mechanism for this in the case of the URLs,
namely using the 10646 hex codes as character tags, in the form
&#xxxx; which is already in the HTML specs.

> Maybe they'll need or want to print two URLs on their business card,
> the Kanji version for those folks who can type in Kanji directly, and
> the ascii-encoded version for those people like me who have to deal
> with a Kanji-impoverished system like this dumb workstation I'm using
> now.

Yes, I think that will be the solution.

> > I am not sure what you are saying here: do you mean that the user
> > should know what charset the URL is encoded in at the server?
> 
> I'm saying that URLs must clearly show in all of their presentations
> the charset used to encode any non-ASCII data. Otherwise, the Big5
> user might type in Big5 URLs to a shift-Jis server.

If the URL is clearly labelled as big5 and charset labelling
part of HTTP, then the shift-jis server would know what
to do with it, and everything will work.

I understand the scenario you describe as: A Japanese business man
gets a visit card from another Japanese business man. It is encoded
in shift-jis. But it has all the characters in ordinary Japanese
in normal Japanese print, example

    http://www.sony.jp/shift-jis/some/japanese/words

He goes to his browser and types it in, The browser normally
uses iso-2022-jp encoding. So it encodes it in iso-2022-jp
and sends the URL. The server gets it and does not find it 
because it is actually not shift-jis but iso-2022-jp coming along.

> I'm suggesting you could use a similar mechanism to that in Section 2
> of RFC 1522. Not the *same* mechanism,

Yes, we already discussed this, please see the earlier discussion.

I do not think that putting things like 
   To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk> 

on a visit card would be aestetically pleasing, it is worse than an
X.400 adresses!

The buttom line is, that if we want to be successfull, we need to
keep it simple and elegant, and bullet-proof. That's the story
of internet. And URLs are very much in the history-making of
*the* Net. We need to keep them simple and elegant, and if possible
hide technicalities like the charset away from the user interface.
Charsets are not needed, and as said by Larry it is conceptually 
not included in the concept of an URL.

So what is wrong with a header based specification in HTTP?

> That is, you could define a charset-enabled HTTP URL as:
> 
>      http://host.dom/charset/encoded-text
> 
> and designate that some web servers might be 'charset-enabled'.
> This doesn't change the HTTP protocol or the URL syntax.
> 
> People who wanted to use anything other than ISO-8859-1 in their file
> names on their web servers could write URLs as
> 
>    http://host.dom.jp/shift-jis/%1B$B!X%1B
> 
> or whatever (apologies for bad shift-jis) on their business cards.

Yes, that illustrate it perfectly: who would like to put such
an URL on his business card? Which Japanese firm would sanely
put such an URL in its ads for kitchen utensils in the newspaper?
Who would read it aloud in the TV programme? 

This is all about having a more natural way of writing URLs in
non-english cultures, and what you are suggesting is crippeling it! 
URLs are on its way into national infrastructures at a level
comparable to telephone numbers, and they better be well adapted
to the culture.

keld