- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 08 Feb 2006 11:55:20 +0900
- To: "Mark Davis" <mark.davis@icu-project.org>
- Cc: Paul.V.Biron@kp.org, duerst@it.aoyama.ac.jp, paul.downey@bt.com, public-i18n-core@w3.org, public-xsd-databinding@w3.org, public-xsd-databinding-request@w3.org
On Wed, 08 Feb 2006 11:47:53 +0900, Mark Davis <mark.davis@icu-project.org> wrote: > >> For all these languages you have transliteration schemes which describe >> how to convert a string in the original script to a version which uses >> only latin letters. I think nearly for one of these languages there is >> a "standardized", totally accepted scheme. But it seems that for your >> purpose it should be enough to choose just one scheme. > This is not really the case; most non-Latin to Latin transliterations > vary quite widely. > > Путин ↔ Putin, Poutine, ... > Горбачёв ↔ Gorbachev, Gorbacev, Gorbatchev, Gorbačëv, Gorbachov, > Gorbatsov, Gorbatschow, ... sorry, [I think nearly for one of these] should have been [I think nearly for *n*one of these] Felix > > Mark > > Felix Sasaki wrote: >> >> Hi Paul, >> >> Sorry for the late follow-up. Just a remark to your question below. >> >> On Fri, 03 Feb 2006 06:26:40 +0900, <Paul.V.Biron@kp.org> wrote: >> >>> >>>> Conversions such as the one you mention from Kanji to Romaji >>>> have the advantage that the result is still fairly legible, >>>> but there are various disadvantages: >>>> - large dictionary needed >>>> - not deterministic (there is often more than one way to >>>> pronounce a Kanji or Kanji combination) >>>> - language-specific, which means a different solution for >>>> each language is needed >>> >>> To provide context for this question from the databinding WG, our goal >>> is >>> to provide guidance to implementors of databinding toolkits: tools >>> that >>> take a schema and produce a set of programming language bindings, e.g., >>> Java classes, that know how to manipulate instances conforming to the >>> schema. Most binding tools do something like the following. Given >>> this >>> schema document fragment >>> >>> <xs:complexType name='MyType'> >>> <xs:sequence> >>> <xs:element name='child1' type='xs:string'/> >>> <xs:element name='child2' type='xs:string' >>> maxOccurs='unbounded'/> >>> </xs:sequence> >>> </xs:complexType> >>> >>> they will produce a class such as: >>> >>> class MyType >>> { >>> String child1 ; >>> List<String> child2 ; >>> } >>> >>> where the element and type names have become names in the programming >>> language (Java in this case). >>> >>> The range of characters that are legal for XML names is much wider than >>> that supported by many programming languages. The question is: what >>> guidance should we give binding tool implementors about what they >>> should >>> do in the face of XML names that contain characters that aren't legal >>> in >>> that programming language? >>> >>> One option is: replace "bad" characters with punctuation, etc. >>> Another option is : for languages that have something resembling a >>> kanji >>> to romanji mapping, automate the mapping (if possible/reasonable). If >>> such automation is not possible/reasonable, perhaps the tool could >>> provide >>> a configuration option to allow the user to "manually" specify the >>> mapping >>> for the particular names used in the schema. >>> >>> We were wondering if i18n had any other options they could recommend or >>> any advice in general about this problem. >>> >>> One question I had was whether languages other than CJK have something >>> similar to kanji -> romanji? For instance, do hebrew, greek, thai, >>> etc. >>> have this concept? >> >> For all these languages you have transliteration schemes which describe >> how to convert a string in the original script to a version which uses >> only latin letters. I think nearly for one of these languages there is >> a "standardized", totally accepted scheme. But it seems that for your >> purpose it should be enough to choose just one scheme. >> >> -- Felix >> >>> >>>> - not reversible (there are many Kanji or Kanji combinations >>>> that lead to the same Romaji) >>> >>> That should not be a problem since the binding tool can store the >>> original >>> XML name as metadata for each name in the language binding for use in >>> serializing instances. >>> >>> pvb >>> >> >> >> >> >> >
Received on Wednesday, 8 February 2006 02:55:44 UTC