Re: mover vs latin chars with diacriticals from David Carlisle on 2006-05-02 (www-math@w3.org from May 2006)

From: David Carlisle <davidc@nag.co.uk>
Date: Tue, 2 May 2006 14:04:47 +0100
To: neils@dessci.com
CC: www-math@w3.org
Message-Id: <200605021304.k42D4lG4030093@edinburgh.nag.co.uk>
I think it's clear that mover is to be preferred (although chapter 3
doesn't spell it out explicitly) and except in a couple of cases where
no non-combining characters appeared to be in Unicode, the DTD always
uses non-combining characters as the first character in the definition
of an entity. (Defining an entity to expand to a string starting with a
combining character would break the guidelines in the W3C/ISO character
use in XML.)

The fact that Unicode defines some slots for predefined characters used
in mathematics isn't really relevant as many of those characters are in
Unicode for compatibility with pre-existing character sets, and in any
case, Unicode is primarily a plain text standard. It has many characters
useful in plain text that would not be used when you have additional
markup possibilities. Apart from diacritic marks Unicode has characters
for (for example) one-half or superscript-2. These are of course very
useful if you are just typing some plain text document in a non
technical setting and need access to these characters. Clearly though
one would not recommend that they are used in MathML, although it's not
an error if they are used, and the behaviour if they are used is well
defined.

While 99 times out of 100 I'd say that you should use mover, it's not
always completely clear cut and is, as Richard Kaye said, a matter of
personal judgement on whether you think of the letter-plus-accent
combination as a mathematical operation applied to an identifier
represented by the base letter (in which case you should use mover)
or whether you think of the accented letter as a single identifier.
I think most mathematical authors, even if writing in natural languages
that make more use of diacritical marks than English, tend to avoid
using such identifiers in mathematical expressions to avoid confusion
with marks used to represent operators, so it is far more common to use
mover than a Unicode accented character.

In fact the choice comes up more often with negated operators. Unicode
has a lot of negated operators, and combining negation slash for making a
lot more. One tends to think of say not-equals as a single boolean
operator rather than as a syntax for not applied to the equals
expression. MathML spec, in the DTD and in Chapter 6, clearly indicates
that these negated characters can/should be used. technically one could
argue that the situations with a combining negation slash and a
combining over-accent are similar but the way they are used in practice
means that MathML favours using markup in one case and Unicode character
combinations in the other. For example MathML doesn't have an moverlay
presentation form that would position a negation slash over a base
(one could probably do something with mphantom and explict spacing, but
it would be hard to generate a good typeset form.


David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Received on Tuesday, 2 May 2006 13:33:30 UTC