To emphasize what Deyan (somewhat humorously) said: there are lots of
characters that look alike. Often, the Unicode standard mentions these
alongside the description of the character. For ":" , it lists:
armenian full stop U+0589
hebrew punctuation sof pasuq U+05C3
ratio U+2236
modifier letter colon U+A789

Additionally (not sure why they aren't listed), there are:
Presentation Form For Vertical Colon: U+FE13
Small Colon: U+FE55
Fullwidth Color: U+FF1A

I would be a little surprised if Deyan can find all six of these colon
variants in arXiv, but maybe three of them are there (certainly U+2236 will
show up).

Although software could detect the difference between these, likely a human
looking at the characters wouldn't know because a slightly odd looking
character might simply be the way a font displays that character. I think
the full spec or some note needs to say something about look-alike
characters, but I'm not sure what. Perhaps a note along the lines of what
is said in the spec in 7.7 Anomalous Mathematical Characters
<>.There are two
audiences for such a note: software that generates MathML and software that
consumes MathML for something other than display. MathPlayer has a list of
about 8,000 characters it knows how to speak, but only the ASCII colon and
'ratio' are in that list. It is not reasonable to expect software to map
every character to something, nor do I think we should make MathML even
more verbose by requiring an 'intent' (in whatever form it ends up taking)
to be on every character (e.g, on <mn>1</mn>). Which brings us back to



