Re: mover vs latin chars with diacriticals from juanrgonzaleza@canonicalscience.com on 2006-04-29 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Sat, 29 Apr 2006 11:17:32 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3124.217.124.88.196.1146334652.squirrel@webmail.canonicalscience.com>
Luca Padovani wrote:

[snip]

> On the other side, encoding (pieces of) a mathematical formula
> using
> Unicode shortcuts reduces your opportunities to decorate the
> document
> with information. If the differentiation symbol is in its own
> <mo>
> element, it can have an hyperlink to its definition,

would not be encoded in content MathML?

> it can be
> colored differently from the base character,

I do not know stylistic possibilities for Unicode implementations.

> it can be
> searched independently of the variable it is applied to.

Is not Unicode combined diacritics already designed for that via canonical
decomposition?

> Also, it occurs often to combine a "diacritical mark" with
> more than a single character. Think of a wide tilde, or a wide
> hat, or a vector arrow spanning a whole expression such
> as (x + y). So, it would
> always be necessary to account for an mover-based encoding in
> such cases.

Are there not several wide mathematical symbols in Unicode e.g. tilde-AB?
What is more, Firefox does not support MathML wide tildes and long
arrows...

There is also alternative notations for very wide items (e.g. TeX)

> Now, if you foster the "more universal" encoding using
> solely
> Unicode characters, you are forcing MathML-crunching
> applications to
> reverse engineer the text: "Hmmm, was that a real diacritical
> mark,
> or was it a differentiation symbol?" By always using mover,
> you achieve a more _uniform_ encoding, and you make the
> markup less
> ambiguous, in case there is no content MathML around to
> understand it
> better.

Maybe you refer to that kind of "_uniform_ encoding" like I have seen when
mc^2 is "encoded" in different ways by MathML tools on real life

<mi>m</mi><msup><mi>c</mi><mn>2</mn></msup>

<mi>m</mi><msup><mrow><mi>c</mi></mrow><mrow><mn>2</mn></mrow></msup>

<msup><mrow><mi>m</mi><mi>c</mi></mrow><mn>2</mn></msup>

<mi>m</mi><mi>c</mi><msup><mrow/><mn>2</mn></msup>

...

I did not see

<mi>m</mi><mi>c</mi><msup><mo/><mn>2</mn></msup>

or

<mi>m</mi><mi>c</mi><msup><none/><mn>2</mn></msup>

but also are accepted by Gecko MathML parser. I suspect one would wait
less variation in a Unicode way.

>
> The fact that the differentiation is encoded using an mover
> element
> does not prevent the MathML rendering engine from using a
> single glyph where the base character and the dot are
> put together.

And then we add more complexity in the browser side. That assuming that
the MathML rendering engine can do it.

>
> Note that the Unicode note about encoding mathematics [1]
> encourages
> the use of decomposed characters when these are used as
> mathematical
> operatos. In any case, it would seem unfair to me the use of
> accuracy
> of text-based searching tools for measuring adequacy of the
> MathML
> encoding of formulas.

Well, It is interesting to see to Soiffer replying me that Unicode "="
legend of the Unicode Standard has just informative value and, therefore,
one is not obligated to use it, whereas you cite an Unicode report
encouraging decomposed characters.

Look this piece extracted from the own link (UTR) you provided

<blockquote>A Unicode Technical Report (UTR) contains informative
material. Conformance to the Unicode Standard does not imply conformance
to any UTR. Other specifications, however, are free to make normative
references to a UTR. </blockquote>

> Cheers,
> --luca
>
> [1] http://www.unicode.org/reports/tr25/


Juan R.

Center for CANONICAL |SCIENCE)
Received on Saturday, 29 April 2006 18:17:40 UTC