Re: Re: Using content-MathML for computation and analysis in Science and Engineering from Peter Murray-Rust on 2012-03-27 (www-math@w3.org from March 2012)

From: Peter Murray-Rust <pm286@cam.ac.uk>
Date: Tue, 27 Mar 2012 09:25:21 +0100
To: www-math@w3.org
Cc: davidc@nag.co.uk, "Andreas.strotmann" <andreas.strotmann@gmail.com>
Message-ID: <CAD2k14P4jO7-SgRZ6Y4fhUPQBT3iyNKGYnoyZNoaLJoSRzqbmg@mail.gmail.com>

Very many thanks to all contributors. I have made extremely good progress
with my project to implement a declarative approach to numerical physical
science (this derives from a symposium I ran in January "Semantic Physical
Science" http://www-pmr.ch.cam.ac.uk/wiki/Semantic_Physical_Science) and
have implemented the core in Java, both for the chemistry/physics and for
the maths.

My current concern is about the implementation of the
declarative-imperative MathML engine. Ideally it shouldn't be (just) me
doing this (and if there are volunteers I'd be delighted). But actually it
isn't a huge amount of effort to code it for physical science. I have added
a differentiation engine which is good enough for what I need and added all
the major operators and functions that I currently need (things like the 20
"unary elementary classical functions" get added while watching cricket).

As an example of the problem I am tackling, see the first equation in
http://en.wikipedia.org/wiki/AMBER ("Functional form"). This is an
empirical description of the relationship between the geometry of molecules
and their energies. It is the basis of a huge amount of current
supercomputing for predicting the geometry and ihteraction of proteins,
materials, minerals, etc. The current codes are archaic and arcane and many
are 25 years old, and hundreds of KLOC. With declarative programming in
MathML and CML It can be reduced to 2-3 pages of readable material (and
some parameter files).


My overriding problem is semantics. The chemistry is my problem, and I have
done that - e.g. "bonds" in the first summation is now a declarative -
imperative statement. I now need to couple that to the MathML.

My current impression is that MathML takes the approaches:

   - everyone agrees on the semantics so we don't need to spell them out
   - OR noone agrees on the semantics so you can do what you like
   - OR the semantics are irrelevant

As an example summations are often indicated with a bvar:

<bvar><ci>x</ci></bvar>

but what are the constraints on implementing this? Is "x" necessarily an
integer? The spec doesn't say so. Is there therefore an agreed definition
of "sum index" that constrains it to be an integer? What happens if the end
index is less than the start? An error? a no-op? or a negatively decreasing
step?

Without strict explicit constraints people will build engines that behave
in different ways. My own recommendation is that there should be a
reference declarative-imperative MathML engine (with unit tests) that
defines the semantics operationally. This is what I did for CML - the JUMBO
engine has thousands of unit tests defining the explicit semantics of CML.
Of course we also try to explain this in words, but words are often fragile.

My current mathml engine is at
https://bitbucket.org/petermr/mathml/overview- it's all Open Source
and I can add developers.

P.

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Received on Tuesday, 27 March 2012 08:25:55 UTC