W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2006

Re: mapping of XML names into programming language

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 31 Jan 2006 13:03:19 +0900
Message-Id: <6.0.0.20.2.20060131125554.06e6eb00@localhost>
To: <paul.downey@bt.com>, <public-i18n-core@w3.org>
Cc: <public-xsd-databinding@w3.org>

Hello Paul,

This is a personal answer. This problem was looked at in the
context of XML Schema and XQuery. I remember a solution where
(speaking generally) non-ASCII characters were transformed
to a sequence of characters that included some underscores
and/or dollar marks and the Unicode character number in hex.

I'm not totally sure whether this solution is currently in
a spec or not, the best person to ask is Michael Rys.

Conversions such as the one you mention from Kanji to Romaji
have the advantage that the result is still fairly legible,
but there are various disadvantages:
- large dictionary needed
- not deterministic (there is often more than one way to
   pronounce a Kanji or Kanji combination)
- not reversible (there are many Kanji or Kanji combinations
   that lead to the same Romaji)
- language-specific, which means a different solution for
   each language is needed

I hope this input can lead to a discussion where we understand
your needs better.

Regards,    Martin.

At 22:05 06/01/30, paul.downey@bt.com wrote:
 >
 >Dear i18n-core,
 >
 >The XML Schema Patterns for Databinding WG has an issue
 >surrounding how to give advice to implementers of binding
 >tools how to represent XML Schema 1.0 names such as elements,
 >types and enumerated type values in the typically more
 >constrained world of databases and programming languages.
 >
 >One way forward is to simply warn product developers to
 >expect to have to provide a manual step to handle the
 >mapping of characters invalid in their processing environment
 >and to avoid any possible symbol clashes.
 >
 >However, several members of the WG felt sure that there may
 >already be some approaches for mapping characters, such as
 >from Kanji to Roman which could also be referenced.
 >
 >We therefore wondered if there was any advice or pointers to
 >existing advice the i18n WG could offer us in this area?
 >
 >Regards,
 >Paul
 >--
 >Chair
 >XML Schema Patterns for Databinding WG
 >http://www.w3.org/2002/ws/databinding/ 
Received on Tuesday, 31 January 2006 09:06:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:50 GMT