Re: Technical reasons for some options taken on design of MathML from juanrgonzaleza@canonicalscience.com on 2006-04-20 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Thu, 20 Apr 2006 10:13:13 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3029.217.124.88.238.1145553193.squirrel@webmail.canonicalscience.com>
Stan Devitt wrote:
>
> While this discussion has brought out that much of the concern lies over
the
> verbosity and structures used to represent the presentation of mathematics
> it began with some serious questions about the choices in content MathML.
>
> I have elected at this point to summarize.some key points that have emerged
> as answers related to content for ease of future reference.
>
> 1.  Why the use of an apply "container" instead of defining each operator
> such as divide as a "container".
> Answers:
>    a)  easy to locate the operator in the XML structure
>    b)  support for arbitrarily complex operators  (e.g. another apply,
> and/or with elaborate presentations)
>    c)  ease of extending mathml to use other symbols with associations  to
> more formal definitions.
>    d)  support use and discussion about the operators outside of the
context
> of applying them to arguments.

Yes, I now understood the apply construct and am using a similar approach.
I consider that design option of MathML was right. However, still some
doubts remain. I read a paper on the topic but now I cannot find it. Could
you cite some paper or document proving the point c). I mean that some
people have expressed doubts about real capabilities of extension of
Content MathML.

> 2.  Why the introduction of  operators and symbols as elements?
> Answers:
>    a)  clearly identifieable role from the rest of the document content
> (Try searching a long string or document for meaningful occurrences of
"E".)
>    b)  elements provide an anchor for definitionURL and attributes
> controlling display.

I would repeat that from the very beginning I was able to see reason for
explicit encoding of things as 2 and =. Many people from TeX community
appear unable to see that. But it is not justified, in my opinion, is the
need for encoding everything, because this adds, verbosity, redundancy,
and complicates the DOM document.

In fact, as Jenny explains in her book on XSLT, the reason for the
introduction of explicit string commands in XPath was that encoding like

<data><ta>a</ta><te>e</te><ti>i</ti><to>o</to><tu>u</tu></data>

is highly inefficient. Jenni recommends encoding imitating CSV files for
those documents where XML markup is not efficient. And if one of XML folks
openly recognizes that...

Markup begins to be not very efficient when one part on each five is
markup code. In MathML situation is still poor as of <cn>2</cn> only one
character of a total of ten is data.

About point b), simply state that those effects can be achieved in other
ways. In fact, I am not following MathML here. As illustration, compare
next HTMLs

<p>this is <w class=“strong”>very</w> important</p>

<p><w class=“normal”>this</w><w class=“normal”>is</w><w
class=“strong”>very</w><w class=“normal”>important</w></p>

Second encoding is redundant.

> 3.  Why not more complete support for semantic specification such as in
> openmath?
> Answers:
>    a) Concern about complexity / scope of first release of MathML
>    b) definitionURL already makes the essential leap of allowing an author
> to warn the consumer that a special meaning is in use and provides a
> mechanism to experiment with mechanisms for more complete specifications -
> much in the spirit of namespaces in XML.  This is way more than ever
existed
> before.
>    c) A desire for more practical experience in this arena before
> standardizing a more elaborate scheme.

a) Then apparently one is obligated to choose between

i) Complex encoding as in OpenMath, that would be difficult to use enough
as SGML was (therein SGML never was very popular).

ii) Simple encoding as in TeX. Useful just for printing: no computational
utility, no directly usable on the web, no content, no good accessibility.

iii) More complex than TeX but less than OpenMath of current MathML. In
practice, none browser I know support Content MathML, most of tools
support presentation only and specialist Andreas Strotmann showed that
something so simple as “Integral of sin x on x from 0 to x” is incorrectly
encoded in Content MathML 2.0 but can be correctly encoded in OpenMath.

Would we choose (iii)?

c) That is good! That experimental status of half the specification would
be clearly stated. Please do not forget, we have heard critic voices in
that content MathML does difficult even the encoding of elementary
mathematics.

> 4.  Why not use infix operators?
> Answers:
>     a) importance of being able to identify the operator programatically.
>     b) consideration for nullary and n-ary operators.

a) The operator + is perfectly defined in other systems as Fortran,
Mathematica, or Maple. If you mean the identification of new operators are
not listed in a default operator table.  Then the operator can be
explicitly marked as in <op>rho</op>  vs  rho.

b) There are not great practical advantages. Something like

(1 + 2 + 3 + 4 + 5 + 6)

can be written in compact form in prefix

(+ 1 2 3 4 5 6)

but something more usual like

(1 + 2 - 3 + 4 - 5 + 6)

or like

(1 - 3 * 4 + 5)

cannot be compacted. At contrary, since there is not preceding rule one
may use extra brackets! For instance last format in content MathML is

(* (- 1 3) (+ 4 5))

that is

<apply><times/><apply><minus/><cn>1</cn><cn>3</cn></apply>
<apply><plus/><cn>4</cn><cn> 5</cn></apply></apply>

In a whole sense, prefix notation does not introduces advantages, that is
reason its use is minimal. Do you know that was Fortran who introduced
infix notations?

And still nobody at this list replied to me, why not postfix notation?

If above a) and b) where the reasons that the MathML WG chose prefix
notation for Content MathML, then one would note that *postfix notation*
has those advantages more a third one: computer obtain first arguments and
next can do call to the function/application, that is many times more
efficient for computers.

> This list is not intended to be all inclusive.  Nor does it address the
> issues regarding presentation, but they are all points to keep in mind when
> attempting to consume mathematical markup by machine.
>
> Stan Devitt
>


Juan R.

Center for CANONICAL |SCIENCE)
Received on Thursday, 20 April 2006 17:13:28 UTC