Default token for Unicode character

Hi together

I have to implement the following in my Presentation MathML editor: When 
the user inserts a unicode character, the editor has to guess which 
token element to use, mi or mo. So, the question is: How to find a sane 
default token element for a given unicode character?

I hoped that the Unicode character database would help. However, I am 
not able to get that right. At the moment I do it like that: For 
characters in the general category Punctuation and Symbol, I use mo. For 
Punctuation, this might be correct. But for Symbol it is cerainly not. 
For example, ∞ (U+221E INFINITY) is in category Sm (i.e. Symbol) but it 
is an identifier. On the other hand, ⇔ (U+21D4 LEFT RIGHT DOUBLE ARROW) 
is as well in the category Sm, but it is an operator.

I also tried to get the information from
http://www.unicode.org/Public/math/revision-11/MathClassEx-11.txt
which comes along with the technical report 25
http://www.unicode.org/reports/tr25/
Unfortunately, I do not understand what they mean by the class N. It 
contains for example ∞ (U+221E INFINITY) and !, that is, it contains 
some operators and some identifiers. So now I know as much as before.

Any ideas?

Greetings
Urs

Received on Friday, 24 July 2009 09:42:04 UTC