Re: Globalizing URIs

Martin J Duerst (mduerst@ifi.unizh.ch)
Mon, 14 Aug 1995 18:04:40 +0200 (MET DST)


Message-Id: <9508141605.AA07557@mocha.bunyip.com>
Subject: Re: Globalizing URIs
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Mon, 14 Aug 1995 18:04:40 +0200 (MET DST)
Cc: mduerst@ifi.unizh.ch, uri@bunyip.com
In-Reply-To: <199508131433.XAA25928@necom830.cc.titech.ac.jp> from "Masataka Ohta" at Aug 13, 95 11:33:28 pm
From: Martin J Duerst <mduerst@ifi.unizh.ch>

Masataka Ohta wrote:

>As is proven with passports and airline tickets, 26 Latin characters
>are more than enough to represent names internationally.

Let us just think a little further along the same line:

As is proven with telephone numbers, ten digits are more than
enough to address anybody with a telephone around the world.
For personal names, the same is easily possible by designing
a world-wide system of social security numbers.

As is proven by data representation in computers, just two
different bit values are enough to represent any data whatsoever.

>So, please don't try to solve a non-existent problem.

I guess Japanese travelling around the world would be more than
happy to have their names in Kanji/Kana on their flight tickets
(of course besides the Latin form for the clerks that have to deal
with these tickets), to have anouncement boards in foreign airports
that show anouncements in Japanese, and even to have anouncements
by voice in Japanese.

The average Japanese has seen his/her name in Latin letters once
in school (when Latin letters are thought), and occasionally for
a credit card or passport application. All the daily business is in
Kanji, or if those are not available, it is in Kana.
Judging from the number of contributors to some Japanese mailing
list, there is quite some percentage of Japanese that uses RFC 1522-
encoded names in their mail headers, and I guess this percentage
would be even higher if there were a more natural implementation.


>Hi, Martin. "ASCII" does not mean "English".
>
>Some of you might be familiar with European environment so that you
>might be able to read, recognize, identify, memorize and type in a
>Swedish Angstrom character. But, Europe is not the entire world.

I do not require that everybody learn Swedish, or the additional
character of the Swedish alphabet. And I don't exactly know how
to type an Angstrom character on my keyboard (although I could
look it up). But I would like to have German names for German
documents, and Japanese names for Japanese documents, and so
on, and I know that there are quite many Swedish people that would
like to use Swedish names for their Swedish documents.
And I also know that on Macintoshes and some other computers,
this is already easily possible, and heavily used.


>To us Japanese, my Japanese name represented with ASCII, that is,
>"Masataka Ohta", which is one of a formal notation of Japanese
>taught at Japanese elementaly schools, is just fine and better
>than "%HH%HH". The notation "%HH%HH" is not so harmful but merely
>the second best.

Of course you will prefer "Masataka Ohta" to "%HH%HH". But your
personal preferences aside, the average Japanese will widely
prefer Japanese names (and likewise the names of documents
that are in Japanese, and so on) to be in the everyday
Kanji/Kana mixture. Next best might be Kana only,
and then maybe Latin letters, so that "%HH%HH" may turn
to be fourth (if ever considered).

And it is true that the representation of Japanese with Latin
letters is thought in Japanese schools, but there is not much
time spent on this subject, and there is a great chance that
the average Japanese, when asked to spell your last name
with Latin letters, will spell it Outa or Ota or O-ta (the "-"
should go as a bar above the O), but not necessarily Ohta,
and show similar problems for other names.
Of course, Japanese also have problems sometimes when
writing proper names in Japanese, but they will know how
to care about this with name cards (many of which don't
have Latin letters on the back side) or by introducing
themselves as "Ohta, you know, 'Oh' as in 'great, thick',
and 'ta' as in 'field'". But most Japanese won't say
"and well, in Latin letters, its 'Oh', with 'o'-'h'"
(you may be an exception).


>And, to non-Japanese, my Japanese name represented with non-ASCII, for
>example with ISO-2022-JP encoding:
>
>	^[$@B@ED!!>;9'^[(B
>
>might be only a little worse than "%HH%HH".

Well, if that name really were in ISO-2022-JP, and not in a form
that might show up on a terminal emulator that doesn't deal with
Japanese, I would actually see it directly as what it is supposed
to represent. So for me and the others that can read Japanese
and do have some appropriate software (which includes all those
in Japan with computer equipement), it would clearly be more readable
than %HH%HH.


>The worst case is when you are looking at a URL containing Japanese
>characters printed on a paper.
>
>Can your brain recognize Japanese characters?

Leaving the problems of 'brain' and 'mind' to people in AI, I can
definitely say that I can recognize and read Japanese, if it is written
on paper or properly encoded in electronic mail.


>In the international environment, most of you can't read, recognize,
>identify, memorize nor type in Japanese characters.
>
>That is, with the international context, plain ASCII (or ISO 646
>IRV) is the way to go.
>
>In short, mail addresses and URLs should be pure ASCII.

I agree for mail addresses. Mail addersses, at least potentially,
can be used from all over the world, and from anybody without
considering any language abilities.
But for URLs in general, this is different. A Japanese author,
writing documents for a Japanese user, should not be forced
to make up document names with Latin characters. But with
the present URL scheme, (s)he is more or less forced to do so.

Regards,	Martin.