Re: mover vs latin chars with diacriticals from Luca Padovani on 2006-04-29 (www-math@w3.org from April 2006)

From: Luca Padovani <padovani.luca@gmail.com>
Date: Sat, 29 Apr 2006 12:34:31 +0200
To: White Lynx <whitelynx@operamail.com>
Cc: www-math@w3.org
Message-Id: <7FE9D263-107D-4BD9-8913-D0641A2DAA94@gmail.com>

Hello White Lynx,

these are my thoughts on the use of diacritical marks in MathML.

On 29/apr/06, at 11:28, White Lynx wrote:
> Using redundant ad hoc markup instead of more universal Unicode  
> solution that can be used in any XML application and even plain  
> Unicode text is not the best option IMHO.

On the other side, encoding (pieces of) a mathematical formula using  
Unicode shortcuts reduces your opportunities to decorate the document  
with information. If the differentiation symbol is in its own <mo>  
element, it can have an hyperlink to its definition, it can be  
colored differently from the base character, it can be searched  
independently of the variable it is applied to.

Also, it occurs often to combine a "diacritical mark" with more than  
a single character. Think of a wide tilde, or a wide hat, or a vector  
arrow spanning a whole expression such as (x + y). So, it would  
always be necessary to account for an mover-based encoding in such  
cases. Now, if you foster the "more universal" encoding using solely  
Unicode characters, you are forcing MathML-crunching applications to  
reverse engineer the text: "Hmmm, was that a real diacritical mark,  
or was it a differentiation symbol?" By always using mover, you  
achieve a more _uniform_ encoding, and you make the markup less  
ambiguous, in case there is no content MathML around to understand it  
better.

> In MathML approach it is unclear how MathML processor should  
> retreive font metrics that ensure accurate positioning of accents.

The fact that the differentiation is encoded using an mover element  
does not prevent the MathML rendering engine from using a single  
glyph where the base character and the dot are put together.

> In addition note that Unicode standard specifies normalization  
> mechanism that establishes correspondence between composed/ 
> decomposed characters (it is needed for string comparison/search  
> purposes).

Note that the Unicode note about encoding mathematics [1] encourages  
the use of decomposed characters when these are used as mathematical  
operatos. In any case, it would seem unfair to me the use of accuracy  
of text-based searching tools for measuring adequacy of the MathML  
encoding of formulas.

Cheers,
--luca

[1] http://www.unicode.org/reports/tr25/

Received on Saturday, 29 April 2006 16:20:57 UTC