- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 07 Feb 2006 19:12:28 +0900
- To: Paul.V.Biron@kp.org, duerst@it.aoyama.ac.jp
- Cc: paul.downey@bt.com, public-i18n-core@w3.org, public-xsd-databinding@w3.org, public-xsd-databinding-request@w3.org
Hi Paul,
Sorry for the late follow-up. Just a remark to your question below.
On Fri, 03 Feb 2006 06:26:40 +0900, <Paul.V.Biron@kp.org> wrote:
>
>> Conversions such as the one you mention from Kanji to Romaji
>> have the advantage that the result is still fairly legible,
>> but there are various disadvantages:
>> - large dictionary needed
>> - not deterministic (there is often more than one way to
>> pronounce a Kanji or Kanji combination)
>> - language-specific, which means a different solution for
>> each language is needed
>
> To provide context for this question from the databinding WG, our goal is
> to provide guidance to implementors of databinding toolkits: tools that
> take a schema and produce a set of programming language bindings, e.g.,
> Java classes, that know how to manipulate instances conforming to the
> schema. Most binding tools do something like the following. Given this
> schema document fragment
>
> <xs:complexType name='MyType'>
> <xs:sequence>
> <xs:element name='child1' type='xs:string'/>
> <xs:element name='child2' type='xs:string'
> maxOccurs='unbounded'/>
> </xs:sequence>
> </xs:complexType>
>
> they will produce a class such as:
>
> class MyType
> {
> String child1 ;
> List<String> child2 ;
> }
>
> where the element and type names have become names in the programming
> language (Java in this case).
>
> The range of characters that are legal for XML names is much wider than
> that supported by many programming languages. The question is: what
> guidance should we give binding tool implementors about what they should
> do in the face of XML names that contain characters that aren't legal in
> that programming language?
>
> One option is: replace "bad" characters with punctuation, etc.
> Another option is : for languages that have something resembling a kanji
> to romanji mapping, automate the mapping (if possible/reasonable). If
> such automation is not possible/reasonable, perhaps the tool could
> provide
> a configuration option to allow the user to "manually" specify the
> mapping
> for the particular names used in the schema.
>
> We were wondering if i18n had any other options they could recommend or
> any advice in general about this problem.
>
> One question I had was whether languages other than CJK have something
> similar to kanji -> romanji? For instance, do hebrew, greek, thai, etc.
> have this concept?
For all these languages you have transliteration schemes which describe
how to convert a string in the original script to a version which uses
only latin letters. I think nearly for one of these languages there is a
"standardized", totally accepted scheme. But it seems that for your
purpose it should be enough to choose just one scheme.
-- Felix
>
>> - not reversible (there are many Kanji or Kanji combinations
>> that lead to the same Romaji)
>
> That should not be a problem since the binding tool can store the
> original
> XML name as metadata for each name in the language binding for use in
> serializing instances.
>
> pvb
>
Received on Tuesday, 7 February 2006 10:12:52 UTC