Re: mapping of XML names into programming language

> For all these languages you have transliteration schemes which 
> describe how to convert a string in the original script to a version 
> which uses only latin letters. I think nearly for one of these 
> languages there is a "standardized", totally accepted scheme. But it 
> seems that for your purpose it should be enough to choose just one 
> scheme. 
This is not really the case; most non-Latin to Latin transliterations 
vary quite widely.

Путин ↔ Putin, Poutine, ...
Горбачёв ↔ Gorbachev, Gorbacev, Gorbatchev, Gorbačëv, Gorbachov, 
Gorbatsov, Gorbatschow, ...

Mark

Felix Sasaki wrote:
>
> Hi Paul,
>
> Sorry for the late follow-up. Just a remark to your question below.
>
> On Fri, 03 Feb 2006 06:26:40 +0900, <Paul.V.Biron@kp.org> wrote:
>
>>
>>> Conversions such as the one you mention from Kanji to Romaji
>>> have the advantage that the result is still fairly legible,
>>> but there are various disadvantages:
>>> - large dictionary needed
>>> - not deterministic (there is often more than one way to
>>>    pronounce a Kanji or Kanji combination)
>>> - language-specific, which means a different solution for
>>>    each language is needed
>>
>> To provide context for this question from the databinding WG, our 
>> goal is
>> to provide guidance to  implementors of databinding toolkits: tools that
>> take a schema and produce a set of programming language bindings, e.g.,
>> Java classes, that know how to manipulate instances conforming to the
>> schema.  Most binding tools do something like the following.  Given this
>> schema document fragment
>>
>> <xs:complexType name='MyType'>
>>         <xs:sequence>
>>                 <xs:element name='child1' type='xs:string'/>
>>                 <xs:element name='child2' type='xs:string'
>> maxOccurs='unbounded'/>
>>         </xs:sequence>
>> </xs:complexType>
>>
>> they will produce a class such as:
>>
>> class MyType
>> {
>>         String child1 ;
>>         List<String> child2 ;
>> }
>>
>> where the element and type names have become names in the programming
>> language (Java in this case).
>>
>> The range of characters that are legal for XML names is much wider than
>> that supported by many programming languages.  The question is: what
>> guidance should we give binding tool implementors about what they should
>> do in the face of XML names that contain characters that aren't legal in
>> that programming language?
>>
>> One option is: replace "bad" characters with punctuation, etc.
>> Another option is : for languages that have something resembling a kanji
>> to romanji mapping, automate the mapping (if possible/reasonable).  If
>> such automation is not possible/reasonable, perhaps the tool could 
>> provide
>> a configuration option to allow the user to "manually" specify the 
>> mapping
>> for the particular names used in the schema.
>>
>> We were wondering if i18n had any other options they could recommend or
>> any advice in general about this problem.
>>
>> One question I had was whether languages other than CJK have something
>> similar to kanji -> romanji?  For instance, do hebrew, greek, thai, etc.
>> have this concept?
>
> For all these languages you have transliteration schemes which 
> describe how to convert a string in the original script to a version 
> which uses only latin letters. I think nearly for one of these 
> languages there is a "standardized", totally accepted scheme. But it 
> seems that for your purpose it should be enough to choose just one 
> scheme.
>
> -- Felix
>
>>
>>> - not reversible (there are many Kanji or Kanji combinations
>>>    that lead to the same Romaji)
>>
>> That should not be a problem since the binding tool can store the 
>> original
>> XML name as metadata for each name in the language binding for use in
>> serializing instances.
>>
>> pvb
>>
>
>
>
>
>

Received on Wednesday, 8 February 2006 02:47:58 UTC