RE: URIs and i18n

Grr.  The question marks below should have read:石田リチャード


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

> -----Original Message-----
> From: [mailto:www-international-
>] On Behalf Of Richard Ishida
> Sent: 29 July 2008 12:46
> To:
> Cc: 'Jean-Guilhem Rouel'
> Subject: RE: URIs and i18n
> I think it is important to note that Jean-Gui plans (if I correctly
> understood his plans from an off-line discussion I had with him) to allow
> people to enter their name into his application in native script, but will
> provide an additional field for them to input an ascii-only version of the
> name that will be used in the URIs.
> Jean-Gui, perhaps it would help to give some examples of how the URIs
> (specifically) will be used.  My understanding is  that people speaking
> various different languages will be using your application and will need to
> read or type a URI on a regular basis.  This would cause difficulties for
> the use of IRIs - a person who is not able to type/read Japanese will have
> difficulty dealing with a URI such as
> One question in my mind is why people should be reading/writing the URI,
> rather than using the application's power to automatically
> obtain/present/package the needed information.
> However, even if it were possible for the application to hide the URIs and
> present people's names to users, many people would still struggle with
> information that only presented a person's name in their native script, and
> would want to see the name in a script that they recognize.  Transcribing on
> a language by language basis into every possible/sensible script would be a
> very tall order, and one that I think is not needed - especially for
> technology related applications. I think ascii characters are a reasonable
> choice for a single, universal script.
> The question then becomes how to achieve an ascii version of a person's
> name.  Again, I think it is a tough challenge to achieve this
> computationally, when each language differs in the way you transcribe to
> ascii, and there are usually multiple possible transcription methods for a
> given language.  A person may also have an idiosyncratic way of spelling
> their name, which should be allowed for.  I also think that just stripping
> accents can lead to clashes of previously differentiated names, to
> unfortunate spellings, and so on - not to mention the fact that it doesn't
> help at all for Arabic, Chinese, Armenian, etc, etc.
> So I think that asking people, when they supply their name, to provide it
> both in native format and in their preferred ascii-only spelling is probably
> the best way. Then both forms of the name should be available whenever a
> name is used.  This means that if people are expected to read/write URIs,
> the IRI and the ascii-only URIs should be equivalent.
> Note that this can also be extended to postal addresses, company names, etc.
> People should be able to choose to look up, say, a Russian company address
> in either local (Cyrillic) or international (Latin) formats, so you need to
> collect and store both forms.
> This also ties in with what I've been saying for years now about the general
> use of IRIs.  I think you should register two domain names, one in native
> script and one in ascii only, if you want your URIs to be used
> internationally as well as by your home market/user base.  I certainly
> encourage the use of IRIs for local use, but there needs to be an
> alternative for others if your URI is exposed to other cultural/linguistic
> groups.
> RI
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
> > -----Original Message-----
> > From: [mailto:www-international-
> >] On Behalf Of Jean-Guilhem Rouel
> > Sent: 28 July 2008 19:03
> > To:
> > Subject: URIs and i18n
> >
> >
> > Hi,
> >
> > I am currently writing a webapp which has URIs of the form
> > where fran-ois is a first
> > name, berl-and a family name and '-' replaces non-ASCII characters. So
> > fran-ois.berl-and could represent someone named François Berléand.
> >
> > Now, I would like to have more "beautiful" URIs, like
> >
> >
> > I am wondering if there's a standard or something defining how to
> > "translate" non-ASCII characters to ASCII ones, be they French special
> > chars, Japanese ones or anything else.
> >
> > If not, is it wise to try to do such a transcription? I don't know if
> > that makes a difference but the tool will be targeted at
> > English-speaking people (but the names can be from any culture).
> >
> > Another possibility would be to use IRIs, but most people would end-up
> > having difficulties typing them and that would make them harder to
> remember.
> >
> > Finally, I could let users choose their ASCII-only URI. I think that's
> > what I'm going to do as it's easier for me and the least likely to
> > offend people, but I would have liked to get your feedback on this topic.
> >
> > Thanks,
> > Jean-Gui

Received on Tuesday, 29 July 2008 18:35:34 UTC