Re: Content MathML editing language: Binary => n-ary syntax from Andrew Miller on 2006-11-20 (www-math@w3.org from November 2006)

From: Andrew Miller <ak.miller@auckland.ac.nz>
Date: Mon, 20 Nov 2006 17:02:39 +1300
To: "For those interested in contributing to the development of CellML." <cellml-discussion@cellml.org>, www-math@w3.org
Message-ID: <456128DF.9050702@auckland.ac.nz>
Peter Jipsen wrote:
> Hi Andrew,
>
> For an interesting comparison you could look at ASCIIMathML 
> http://www1.chapman.edu/~jipsen/mathml/asciimath.html and the syntax 
> that was chosen for the plain text to Presentation MathML translation. 
> Since this is done on the fly in JavaScript, it had to be very simple 
> and fast, so there are some compromises in the quality of the PMathML 
> that is generated, but the input syntax is very natural (even for 
> undergraduate students).
As would be expected for presentation MathML input language, it seems 
much of your language is orientated towards specifying layout.

However, there are a few features in your language which would probably 
be useful in a content MathML input language:
1) I note you use ^ for superscripts. This could be used in content 
MathML input languages for exponentiation (the only problem with this, 
of course, is that ^ is used in-order in many programming languages to 
mean exclusive-or, as well as exponentiation in others).
2) I note that you treat certain strings, like sqrt, as being functions, 
even without intervening whitespace, e.g. to allow sqrtx. I'm not sure I 
like this, because it makes a large block of variables names ambiguous 
with a function, which would require some sort of escape syntax.
>
> My experience with ease-of-use for typing math is to make the ascii 
> look as much as possible like the math that it represents. This is 
> what users would guess without reading a manual (and who does that 
> nowadays?:-).
I agree to some extent (hence why I am proposing in-order operators 
where the mathematics used in written publications commonly uses them, 
and pre-order operators everywhere else). For more complex constructs 
like derivatives, which have several common forms, I'm not sure how easy 
it is for users to guess straight off anyway. I think simplicity (so 
that if they do look up the rules, they will remember them for next 
time, and so they can quickly learn from examples), and similarity to 
existing languages (where possible) are probably more important goals here.
>
> For Content MathML you are asking some important questions. I believe 
> the bracketing structure should be preserved, so a+b+c should not be 
> nested, but a+(b+c) should be nested in the CMathML. Not all 
> interpretations of + necessarily insist that these two formulas 
> evaluate to the same value (in fact in most programming languages with 
> a maxint limit they can differ because of overflow).
The way you have bracketed your example changed the precedence, so 
strays slightly from my original question (after all, if you substituted 
+ for - in your example, you would change the result even for 
real-valued variables).

Comparing a+b+c with (a+b)+c is more interesting, because the order of 
operations in the same. This is actually a slight discrepancy between 
content MathML (where plus is an n-ary operator) and and the standard 
approach for studying groups in mathematics (because the usual 
definition of a group makes the operator a binary operator). I am not 
aware of any document which defines how the n-ary operators on a 
variable in content MathML map onto the binary operator plus. However, 
my interpretation is that if Xi are members of a group on addition (for 
all i), then
plus(X1,X2,...,Xn) = plus(X1,X2,...,Xn-1) + Xn (n > 2)
plus(X1,X2) = X1 + X2

In this case, the value of the expression is not changed between a+b+c 
and (a+b)+c, and therefore, the meaning shouldn't be changed (the only 
case to worry about is that the semantics of n-ary plus could be changed 
by definitionURL, in which case the assumption that the two MathML trees 
are equivalent would not hold. However, CellML is intended for the 
interchange of mathematical models, so using definitionURL would harm 
this goal and be bad anyway, and secondly, it is unclear whether the 
a+b+c case should use the tertiary form, or if it should be folded by 
the precedence rules into two binary operations).
>
> For differentiation, how about d/(dx)(f(x)) or d/dx(f(x)) and 
> del/delx(f(x)) for partial derivatives?
There are some minor issues here, especially if there are variables 
called d and dx in the CellML component. Then, there could be ambiguity 
between
<apply><divide/>
  <ci>d</ci>
  <ci>dx</ci>
</apply>

Pushing the d together with the bound variable name could potentially be 
confusing (variable names in CellML are frequently more than one 
character long, so I would find it easier to understand an equation if 
there was a bracket after the d).

The other issue (this might be a somewhat CellML specific issue, and not 
generalise to other users of content MathML) is that most CellML models 
that use the diff operator do so to define systems of ordinary 
differential equations, and so they will almost always take the 
derivative of one variable with respect to another. It is rare to see 
papers on these types of models use the differention operator, so the 
d(x)/d(time) notation makes more sense than d/d(time)(x) (unless we are 
going to have both forms).

Best regards,
Andrew
Received on Monday, 20 November 2006 04:02:54 UTC