RE: Technical reasons for some options taken on design of MathML from juanrgonzaleza@canonicalscience.com on 2006-04-18 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Tue, 18 Apr 2006 03:00:47 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3374.217.124.88.179.1145354447.squirrel@webmail.canonicalscience.com>
Basic requirements for mathematical-scientific language

I have updated some basic requirements for a generic mathematical markup
language for scientific requirements at the next link.

[http://canonicalscience.blogspot.com/2006/04/scientific-language-canonml-is.html]

Some requirements fit into the XML model and could be considered for
debate for the future mathML specifications. Other requirements do not fit
and will be developed in alternative mathematical approaches to those from
the w3c from the Center for CANONICAL |SCIENCE).

Some requirements were presented in the past

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

but that document will be updated.


=== Basic requirements =============


Data optimisation
-----------------

MathML is unnaturally verbose and redundant. Whereas in practice this is
not a serious problem for encoding simple formulae as E=mc^2, it is a
problem for scientific databases and for computation or interchange of
information.

In shorthand notation, the Redfield equation reads

(partial rho) / (partial t) = (L + R) rho

where R is the Redfield tensor. But equation stored for a small
physicochemical system of current interest needs of the order of 7 GB of
memory.

Taking an x10 verbosity factor, we would need 70 Gb in MathML for the same
equation.

The Redfield equation is an ultrasimplified version of more general
equations.

For example, following MathML 2.0 specification matrix

0 1 0
0 0 1
1 0 0

is encoded as

<matrix>
  <matrixrow>
    <cn>0</cn><cn>1</cn><cn>0</cn>
  </matrixrow>
  <matrixrow>
    <cn>0</cn><cn>0</cn><cn>1</cn>
  </matrixrow>
  <matrixrow>
    <cn>1</cn><cn>0</cn><cn>0</cn>
  </matrixrow>
</matrix>

but can I use this ultraverbose encoding for Detour matrices of scientific
interest? Detour matrices are N x N ones. In mathematical chemistry, N is
of the order of the size of a chemical compound.

I do not consider elegant and coherent encoding big (N = 1000) Detour
matrices using MathML. Is it?


Encoding of non-hierarchical structures
---------------------------------------

This may be useful on quantum mechanical models.


Extensibility
-------------

Currently MathML presentational markup is not, and not all people agree on
extensibility of Content MathML.


Backward compatibility
----------------------

Language would be more close possible to popular existent systems. I mean:
TeX, LaTeX, Mathematica, Maple, Fortran, Lisp, C, ISO 12083, AAP Math,
some scientific DTD (Elsevier one), etc.

This also includes compatibility with CSS, HTML and others.


Formal language
---------------

For example, SXML is directly based in SEXPR and permit us to exploit
formal structure for abstraction layers.

I agree with mathematician Chaitin on the possibilities of computerized
versions of set theory.


Simplicity
----------

The good and concise is twice good!

The language would be directly manipulated and encoded by humans.

Another “advanced” site where (ds)^2 is being incorrectly served as 2s ds
is Distler’s blog MUSSINGS.

If you rely on tools and you are trained to never see the underlying code
(MathML is popularly presented as a kind of hidden mathematical
postscript) you do not know you are encoding.

MathML ultraverbose code

<mrow>
<semantics>
  <mrow>
    <msubsup>
      <mo>&int;</mo>
      <mn>1</mn>
      <mi>t</mi>
    </msubsup>
    <mfrac>
      <mrow>
        <mo>&dd;</mo>
        <mi>x</mi>
      </mrow>
      <mi>x</mi>
    </mfrac>
  </mrow>
  <annotation-xml encoding="MathML-Content">
    <apply>
      <int/>
      <bvar><ci>x</ci></bvar>
      <lowlimit><cn>1</cn></lowlimit>
      <uplimit><ci>t</ci></uplimit>
      <apply>
        <divide/>
        <cn>1</cn>
        <ci>x</ci>
      </apply>
    </apply>
  </annotation-xml>
</semantics>
</mrow>


for \int_1^t \frac{dx}{x} may be avoided. _Difficulty of the encoding
would be of same order than in TeX_.



Juan R.

Center for CANONICAL |SCIENCE)
Received on Tuesday, 18 April 2006 10:00:59 UTC