- From: Richard Ishida <ishida@w3.org>
- Date: Tue, 29 Jul 2008 19:34:55 +0100
- To: <www-international@w3.org>
- Cc: "'Jean-Guilhem Rouel'" <jean-gui@w3.org>
Grr. The question marks below should have read: http://example.org/users/石田リチャード RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/International/ http://rishida.net/ > -----Original Message----- > From: www-international-request@w3.org [mailto:www-international- > request@w3.org] On Behalf Of Richard Ishida > Sent: 29 July 2008 12:46 > To: www-international@w3.org > Cc: 'Jean-Guilhem Rouel' > Subject: RE: URIs and i18n > > > I think it is important to note that Jean-Gui plans (if I correctly > understood his plans from an off-line discussion I had with him) to allow > people to enter their name into his application in native script, but will > provide an additional field for them to input an ascii-only version of the > name that will be used in the URIs. > > Jean-Gui, perhaps it would help to give some examples of how the URIs > (specifically) will be used. My understanding is that people speaking > various different languages will be using your application and will need to > read or type a URI on a regular basis. This would cause difficulties for > the use of IRIs - a person who is not able to type/read Japanese will have > difficulty dealing with a URI such as http://example.org/users/??????? > > One question in my mind is why people should be reading/writing the URI, > rather than using the application's power to automatically > obtain/present/package the needed information. > > However, even if it were possible for the application to hide the URIs and > present people's names to users, many people would still struggle with > information that only presented a person's name in their native script, and > would want to see the name in a script that they recognize. Transcribing on > a language by language basis into every possible/sensible script would be a > very tall order, and one that I think is not needed - especially for > technology related applications. I think ascii characters are a reasonable > choice for a single, universal script. > > The question then becomes how to achieve an ascii version of a person's > name. Again, I think it is a tough challenge to achieve this > computationally, when each language differs in the way you transcribe to > ascii, and there are usually multiple possible transcription methods for a > given language. A person may also have an idiosyncratic way of spelling > their name, which should be allowed for. I also think that just stripping > accents can lead to clashes of previously differentiated names, to > unfortunate spellings, and so on - not to mention the fact that it doesn't > help at all for Arabic, Chinese, Armenian, etc, etc. > > So I think that asking people, when they supply their name, to provide it > both in native format and in their preferred ascii-only spelling is probably > the best way. Then both forms of the name should be available whenever a > name is used. This means that if people are expected to read/write URIs, > the IRI and the ascii-only URIs should be equivalent. > > Note that this can also be extended to postal addresses, company names, etc. > People should be able to choose to look up, say, a Russian company address > in either local (Cyrillic) or international (Latin) formats, so you need to > collect and store both forms. > > This also ties in with what I've been saying for years now about the general > use of IRIs. I think you should register two domain names, one in native > script and one in ascii only, if you want your URIs to be used > internationally as well as by your home market/user base. I certainly > encourage the use of IRIs for local use, but there needs to be an > alternative for others if your URI is exposed to other cultural/linguistic > groups. > > > RI > > > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/International/ > http://rishida.net/ > > > > > -----Original Message----- > > From: www-international-request@w3.org [mailto:www-international- > > request@w3.org] On Behalf Of Jean-Guilhem Rouel > > Sent: 28 July 2008 19:03 > > To: www-international@w3.org > > Subject: URIs and i18n > > > > > > Hi, > > > > I am currently writing a webapp which has URIs of the form > > http://example.org/users/fran-ois.berl-and where fran-ois is a first > > name, berl-and a family name and '-' replaces non-ASCII characters. So > > fran-ois.berl-and could represent someone named François Berléand. > > > > Now, I would like to have more "beautiful" URIs, like > > http://example.org/users/francois.berleand. > > > > I am wondering if there's a standard or something defining how to > > "translate" non-ASCII characters to ASCII ones, be they French special > > chars, Japanese ones or anything else. > > > > If not, is it wise to try to do such a transcription? I don't know if > > that makes a difference but the tool will be targeted at > > English-speaking people (but the names can be from any culture). > > > > Another possibility would be to use IRIs, but most people would end-up > > having difficulties typing them and that would make them harder to > remember. > > > > Finally, I could let users choose their ASCII-only URI. I think that's > > what I'm going to do as it's easier for me and the least likely to > > offend people, but I would have liked to get your feedback on this topic. > > > > Thanks, > > Jean-Gui >
Received on Tuesday, 29 July 2008 18:35:34 UTC