- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 07 Feb 2006 19:12:28 +0900
- To: Paul.V.Biron@kp.org, duerst@it.aoyama.ac.jp
- Cc: paul.downey@bt.com, public-i18n-core@w3.org, public-xsd-databinding@w3.org, public-xsd-databinding-request@w3.org
Hi Paul, Sorry for the late follow-up. Just a remark to your question below. On Fri, 03 Feb 2006 06:26:40 +0900, <Paul.V.Biron@kp.org> wrote: > >> Conversions such as the one you mention from Kanji to Romaji >> have the advantage that the result is still fairly legible, >> but there are various disadvantages: >> - large dictionary needed >> - not deterministic (there is often more than one way to >> pronounce a Kanji or Kanji combination) >> - language-specific, which means a different solution for >> each language is needed > > To provide context for this question from the databinding WG, our goal is > to provide guidance to implementors of databinding toolkits: tools that > take a schema and produce a set of programming language bindings, e.g., > Java classes, that know how to manipulate instances conforming to the > schema. Most binding tools do something like the following. Given this > schema document fragment > > <xs:complexType name='MyType'> > <xs:sequence> > <xs:element name='child1' type='xs:string'/> > <xs:element name='child2' type='xs:string' > maxOccurs='unbounded'/> > </xs:sequence> > </xs:complexType> > > they will produce a class such as: > > class MyType > { > String child1 ; > List<String> child2 ; > } > > where the element and type names have become names in the programming > language (Java in this case). > > The range of characters that are legal for XML names is much wider than > that supported by many programming languages. The question is: what > guidance should we give binding tool implementors about what they should > do in the face of XML names that contain characters that aren't legal in > that programming language? > > One option is: replace "bad" characters with punctuation, etc. > Another option is : for languages that have something resembling a kanji > to romanji mapping, automate the mapping (if possible/reasonable). If > such automation is not possible/reasonable, perhaps the tool could > provide > a configuration option to allow the user to "manually" specify the > mapping > for the particular names used in the schema. > > We were wondering if i18n had any other options they could recommend or > any advice in general about this problem. > > One question I had was whether languages other than CJK have something > similar to kanji -> romanji? For instance, do hebrew, greek, thai, etc. > have this concept? For all these languages you have transliteration schemes which describe how to convert a string in the original script to a version which uses only latin letters. I think nearly for one of these languages there is a "standardized", totally accepted scheme. But it seems that for your purpose it should be enough to choose just one scheme. -- Felix > >> - not reversible (there are many Kanji or Kanji combinations >> that lead to the same Romaji) > > That should not be a problem since the binding tool can store the > original > XML name as metadata for each name in the language binding for use in > serializing instances. > > pvb >
Received on Tuesday, 7 February 2006 10:12:49 UTC