- From: <Paul.V.Biron@kp.org>
- Date: Thu, 2 Feb 2006 13:26:40 -0800
- To: duerst@it.aoyama.ac.jp
- Cc: paul.downey@bt.com, public-i18n-core@w3.org, public-xsd-databinding@w3.org, public-xsd-databinding-request@w3.org
> Conversions such as the one you mention from Kanji to Romaji > have the advantage that the result is still fairly legible, > but there are various disadvantages: > - large dictionary needed > - not deterministic (there is often more than one way to > pronounce a Kanji or Kanji combination) > - language-specific, which means a different solution for > each language is needed To provide context for this question from the databinding WG, our goal is to provide guidance to implementors of databinding toolkits: tools that take a schema and produce a set of programming language bindings, e.g., Java classes, that know how to manipulate instances conforming to the schema. Most binding tools do something like the following. Given this schema document fragment <xs:complexType name='MyType'> <xs:sequence> <xs:element name='child1' type='xs:string'/> <xs:element name='child2' type='xs:string' maxOccurs='unbounded'/> </xs:sequence> </xs:complexType> they will produce a class such as: class MyType { String child1 ; List<String> child2 ; } where the element and type names have become names in the programming language (Java in this case). The range of characters that are legal for XML names is much wider than that supported by many programming languages. The question is: what guidance should we give binding tool implementors about what they should do in the face of XML names that contain characters that aren't legal in that programming language? One option is: replace "bad" characters with punctuation, etc. Another option is : for languages that have something resembling a kanji to romanji mapping, automate the mapping (if possible/reasonable). If such automation is not possible/reasonable, perhaps the tool could provide a configuration option to allow the user to "manually" specify the mapping for the particular names used in the schema. We were wondering if i18n had any other options they could recommend or any advice in general about this problem. One question I had was whether languages other than CJK have something similar to kanji -> romanji? For instance, do hebrew, greek, thai, etc. have this concept? > - not reversible (there are many Kanji or Kanji combinations > that lead to the same Romaji) That should not be a problem since the binding tool can store the original XML name as metadata for each name in the language binding for use in serializing instances. pvb
Received on Thursday, 2 February 2006 21:27:04 UTC