RE: URIs and i18n

I think it is important to note that Jean-Gui plans (if I correctly
understood his plans from an off-line discussion I had with him) to allow
people to enter their name into his application in native script, but will
provide an additional field for them to input an ascii-only version of the
name that will be used in the URIs.

Jean-Gui, perhaps it would help to give some examples of how the URIs
(specifically) will be used.  My understanding is  that people speaking
various different languages will be using your application and will need to
read or type a URI on a regular basis.  This would cause difficulties for
the use of IRIs - a person who is not able to type/read Japanese will have
difficulty dealing with a URI such as  

One question in my mind is why people should be reading/writing the URI,
rather than using the application's power to automatically
obtain/present/package the needed information. 

However, even if it were possible for the application to hide the URIs and
present people's names to users, many people would still struggle with
information that only presented a person's name in their native script, and
would want to see the name in a script that they recognize.  Transcribing on
a language by language basis into every possible/sensible script would be a
very tall order, and one that I think is not needed - especially for
technology related applications. I think ascii characters are a reasonable
choice for a single, universal script.

The question then becomes how to achieve an ascii version of a person's
name.  Again, I think it is a tough challenge to achieve this
computationally, when each language differs in the way you transcribe to
ascii, and there are usually multiple possible transcription methods for a
given language.  A person may also have an idiosyncratic way of spelling
their name, which should be allowed for.  I also think that just stripping
accents can lead to clashes of previously differentiated names, to
unfortunate spellings, and so on - not to mention the fact that it doesn't
help at all for Arabic, Chinese, Armenian, etc, etc.  

So I think that asking people, when they supply their name, to provide it
both in native format and in their preferred ascii-only spelling is probably
the best way. Then both forms of the name should be available whenever a
name is used.  This means that if people are expected to read/write URIs,
the IRI and the ascii-only URIs should be equivalent.  

Note that this can also be extended to postal addresses, company names, etc.
People should be able to choose to look up, say, a Russian company address
in either local (Cyrillic) or international (Latin) formats, so you need to
collect and store both forms.

This also ties in with what I've been saying for years now about the general
use of IRIs.  I think you should register two domain names, one in native
script and one in ascii only, if you want your URIs to be used
internationally as well as by your home market/user base.  I certainly
encourage the use of IRIs for local use, but there needs to be an
alternative for others if your URI is exposed to other cultural/linguistic


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

> -----Original Message-----
> From: [mailto:www-international-
>] On Behalf Of Jean-Guilhem Rouel
> Sent: 28 July 2008 19:03
> To:
> Subject: URIs and i18n
> Hi,
> I am currently writing a webapp which has URIs of the form
> where fran-ois is a first
> name, berl-and a family name and '-' replaces non-ASCII characters. So
> fran-ois.berl-and could represent someone named François Berléand.
> Now, I would like to have more "beautiful" URIs, like
> I am wondering if there's a standard or something defining how to
> "translate" non-ASCII characters to ASCII ones, be they French special
> chars, Japanese ones or anything else.
> If not, is it wise to try to do such a transcription? I don't know if
> that makes a difference but the tool will be targeted at
> English-speaking people (but the names can be from any culture).
> Another possibility would be to use IRIs, but most people would end-up
> having difficulties typing them and that would make them harder to
> Finally, I could let users choose their ASCII-only URI. I think that's
> what I'm going to do as it's easier for me and the least likely to
> offend people, but I would have liked to get your feedback on this topic.
> Thanks,
> Jean-Gui

Received on Tuesday, 29 July 2008 11:46:17 UTC