Re: Using content-MathML for computation and analysis in Science and Engineering from Peter Murray-Rust on 2012-03-15 (www-math@w3.org from March 2012)

From: Peter Murray-Rust <pm286@cam.ac.uk>
Date: Thu, 15 Mar 2012 18:37:38 +0000
To: Roger Martin <mathmldashx@yahoo.com>
Cc: "www-math@w3.org" <www-math@w3.org>
Message-ID: <CAD2k14NhDLHTueUNuP2kbm7YwEvQfyew6KR7ZADV4OJw6=qTug@mail.gmail.com>

On Thu, Mar 15, 2012 at 5:59 PM, Roger Martin <mathmldashx@yahoo.com> wrote:

> Hi Peter,
>
> What Daniel suggests is very much what I do while applying xslt transforms
> as just-in-time coding of math from mathml documents.
> The xslt would recognize any
> <m:apply>
>                 <m:eq />
>
>                 <m:ci>k</m:ci>
>                 <m:cn>1.0</m:cn>
> </m:apply>
> as a field with getters and setters in the generated class.  Arrays
> treated the same way.  Also the final results could be gotten back the same
> way where a mathml document needs to return more than one argument to the
> engine's scope.
>
> I've explored many different avenues and purposes of just-in-time
> coding.  Went so far as exploring specialized xslt engine combined with
> direct byte-code manipulation (http://asm.ow2.org/) or producing output
> in OpenCL (http://www.khronos.org/opencl/) format running them via JavaCl(
> http://code.google.com/p/javacl/) or the nvidia tool kit.
>
> This approach you are looking at can actually simplify use of gpu's etc.
> because it is solving the immediate runtime needs rather than try to be
> generalized, solving everything for everybody like conventional precompiled
> code attempts to do.
>
>
I've probably gone round the same roundabouts. I've used XSLT 1.0 (don't
use 2.0 because of portability). Written one-xsl-per-element including some
fairly hairy things for drawing molecules using SVG. I've come to the
conclusion that XSLT doesn't scale well for large systems, especially where
other libraries are involved.

So I use a DOM, specifically XOM from xom.nu which is much simpler and
better than W3C DOM. It allows subclassing , and in CML every element has
its own sublcass (about 110).I'm doing the same with MathML - so far it's
going very well.

Each element has a class such as CIElement or APPLYElement which is
populated by recursive descent when the MathML is parsed. (Since I don't
know MathML well I can't validate on the fly). I also don't need (at
present) to build the MathML programmatically as we are using given
functional forms. (In CML much of the work is programmatic building of
chemistry).

Each class has a function eval() which evaluates the MathML where possible.
For examples numbers can be added and multiplied. variables (ci) can be set
programmatically and this means that expressions can often be evaluated to
a single double (I support integers and doubles). The classes can also have
other generic functionality and I am experiementing with differentiate().

The main challenge - as we  have discussed, is the scope and exactly how we
assign variables. I am probably skipping over important semantics but I use:

<apply><eq/><ci>x</ci><cn>1.2</cn></apply>

To populate a variable in the scope of the containing <math> element. An x
in a subsequent expression is then replaced by 1.2 . This may be naive
mathematically but it works for me :-)

> I'd enjoy more discussion of applying content mathml is this area.
>
> Excellent - as long as the list members don't mind I am happy to continue.
My code is at

http://www.bitbucket.org/petermr/mathml
and is deployed in

http://www.bitbucket.org/petermr/semantic-forcefield

It's only 3 days old. It runs under Java/Maven and I'd be delighted if
anyone wants to play.

Ultimately I can see that undergraduate cheistry textbooks and many
research papers could be written in this way. It's a very good discipline
for understanding the semantics of the domain

Roger (We've talked in the past about ANTLR parsers for quixote-qcdb)
>

Indeed! We use ANTLR for postprocessing Nat Lang Processing output (ANTLR
cannot scale to the complete problem which requires heuristics).

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Received on Thursday, 15 March 2012 18:38:07 UTC