Re: mapping of XML names into programming language

Hi Paul, all,

We discussed this problem at the i18n core call yesterday. The group saw  
the same problems with a romanization which Martin and me had mentioned  
before. Francois Yergeau proposed to use an XML like escaping mechanism  
(see http://www.w3.org/TR/REC-xml/#dt-charref ):
[66]    CharRef    ::=    '&#' [0-9]+ ';'
   | '&#x' [0-9a-fA-F]+ ';'
I guess you cannot use '&#' at the beginning of your usage scenario, but  
if you define s.t. else as a "marker" for the beginning of a character  
references, that would be fine I think.
This solution would of course be reversible.
Would that solve your problems?

Regards, Felix.

On Tue, 31 Jan 2006 15:41:36 +0900, <paul.downey@bt.com> wrote:

>
> Hi Felix,
>
>> I am not yet sure if I understand your problem.
>> Do you want to be able to map something like
>> <nihon>... (written with Japanese characters ??) into
>> <nihon> ... (written with latin characters only)?
>
>> This is the mapping from Kanji to Romaji you are mentioning below.
>> Unfortunately this works only with a lexicon and on a per language  
>> basis.
>
> OK, understood.
>
>> It is also not reversible, e.g. "nihon" can be mapped to ?? or ???or
>> others.
>
> understood. (curse my web mail, btw)
>
>> This kind of mapping is something I guess you don't want for your
>> tasks.
>
>
>> Currently, the names in XML Schema are defined at
>> http://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName as
>> NCName   ::=  (Letter | '_') (NCNameChar)*
>
>> I guess what you need is a mapping of "Letter" and "NCNameChar" to a
>> subset of these character ranges, which fits programming language
>> requirements. Is that right?
>
> Exactly! Please note we're not expecting to find anything
> definitive here, but would welcome hearing about existing
> works in this area we could possibly reference.
>
>> Then the next question would be if you have
>> the need in your scenario to go back to the original XML name. If the
>> answer is "yes", you will have the same ambiguity as with the mapping  
>> from
>> "nihon" to "??" (or "??").
>
> That might not be required, since a databinding could hold a map
> for 'decoding' and resolve clashes by adding a prefix or a suffix,
> nihon1, nihon2, etc.
>
>> If you could give more details on your requirements, me and the i18n  
>> core
>> working group will take a closer look at possible solutions.
>
> thanks!
>
> Paul
>

Received on Wednesday, 1 February 2006 02:50:08 UTC