Re: mapping of XML names into programming language from Felix Sasaki on 2006-02-07 (public-i18n-core@w3.org from January to March 2006)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 07 Feb 2006 19:12:28 +0900
To: Paul.V.Biron@kp.org, duerst@it.aoyama.ac.jp
Cc: paul.downey@bt.com, public-i18n-core@w3.org, public-xsd-databinding@w3.org, public-xsd-databinding-request@w3.org
Message-ID: <op.s4lla2qqx1753t@ibm-60d333fc0ec.mag.keio.ac.jp>

Hi Paul,

Sorry for the late follow-up. Just a remark to your question below.

On Fri, 03 Feb 2006 06:26:40 +0900, <Paul.V.Biron@kp.org> wrote:

>
>> Conversions such as the one you mention from Kanji to Romaji
>> have the advantage that the result is still fairly legible,
>> but there are various disadvantages:
>> - large dictionary needed
>> - not deterministic (there is often more than one way to
>>    pronounce a Kanji or Kanji combination)
>> - language-specific, which means a different solution for
>>    each language is needed
>
> To provide context for this question from the databinding WG, our goal is
> to provide guidance to  implementors of databinding toolkits: tools that
> take a schema and produce a set of programming language bindings, e.g.,
> Java classes, that know how to manipulate instances conforming to the
> schema.  Most binding tools do something like the following.  Given this
> schema document fragment
>
> <xs:complexType name='MyType'>
>         <xs:sequence>
>                 <xs:element name='child1' type='xs:string'/>
>                 <xs:element name='child2' type='xs:string'
> maxOccurs='unbounded'/>
>         </xs:sequence>
> </xs:complexType>
>
> they will produce a class such as:
>
> class MyType
> {
>         String child1 ;
>         List<String> child2 ;
> }
>
> where the element and type names have become names in the programming
> language (Java in this case).
>
> The range of characters that are legal for XML names is much wider than
> that supported by many programming languages.  The question is: what
> guidance should we give binding tool implementors about what they should
> do in the face of XML names that contain characters that aren't legal in
> that programming language?
>
> One option is: replace "bad" characters with punctuation, etc.
> Another option is : for languages that have something resembling a kanji
> to romanji mapping, automate the mapping (if possible/reasonable).  If
> such automation is not possible/reasonable, perhaps the tool could  
> provide
> a configuration option to allow the user to "manually" specify the  
> mapping
> for the particular names used in the schema.
>
> We were wondering if i18n had any other options they could recommend or
> any advice in general about this problem.
>
> One question I had was whether languages other than CJK have something
> similar to kanji -> romanji?  For instance, do hebrew, greek, thai, etc.
> have this concept?

For all these languages you have transliteration schemes which describe  
how to convert a string in the original script to a version which uses  
only latin letters. I think nearly for one of these languages there is a  
"standardized", totally accepted scheme. But it seems that for your  
purpose it should be enough to choose just one scheme.

-- Felix

>
>> - not reversible (there are many Kanji or Kanji combinations
>>    that lead to the same Romaji)
>
> That should not be a problem since the binding tool can store the  
> original
> XML name as metadata for each name in the language binding for use in
> serializing instances.
>
> pvb
>

Received on Tuesday, 7 February 2006 10:12:52 UTC