Re: Technical reasons for some options taken on design of MathML

I have received a private reply to one of my questions of MathML technical
design. I post it next (except name)

[snip]

> juanrgonzaleza@canonicalscience.com wrote:
>
>> It is more, I repeat again, what is the MathML representation for the
>> structure
>>
>> Over
>>      sup
>> Base
>>      sub
>> under
>>
>> ?
>
> Is this a trick question?

It is just a technical question would be easily answered from the
specification, no?

>
> <math xmlns="http://www.w3.org/1998/Math/MathML">
>    <munderover>
>      <msubsup>
>        <mi>Base</mi>
>        <mi>sub</mi>
>        <mi>sup</mi>
>      </msubsup>
>      <mi>under</mi>
>      <mi>Over</mi>
>    </munderover>
> </math>
>
> Something wrong with this? The above could be generated, for
> example, by a simple parser from following input:
>    (Base_sub^sup)__under^^Over

Nothing wrong (I understand that your use of mi tokens was purely
illustrative) except I was asking for the encoding of

Over
     sup
Base
     sub
under

The basis for over and under scripts in your example is incorrect.

The question is that MathML designers have done a couple of errors in the
specification. This is a clear example.

In mathematical SGML, the basis for scripts is well-defined and the
technological design for scripts is rather solid. MathML breaks tradition
and introduces basis into the tags instead reusing precedent models.

Instead of something like <basis>basis</basis><sup>sup</sup> á la SGML. We
find the <msup> basis sup</msup>. It is claimed that this introduces a
better implementation model for parsing MathML, but this argument *forget*
that, since MathML is not authored by hand you need an alternative
syntax/tool for generating MathML documents.

Those syntaxes/tools (Mathematica, TeX, specialized conversors, Fortran
code...) work á la SGML way (really SGML math was designed to be backward
compatible with others models including the HTML <sup> way to encoding
superscripts in text).

Therefore, you need additional software layers that may understand BOTH
the model basis^index and the MathML model <tag>basis index</tag>. That
add the *double* of complexity that with a single model (e.g. the
traditional one basis^index). Any initial simplicity is lost.

But it can be still poor than that. Due to computational difficulties on
translating from usual input syntax to MathML one I find realistic code on
the Internet with completely incorrect MathML code

<mi>a</mi><msup><mrow/><mi>b</mi></msup>

simulating TeX, Mathematica-like input a^b or Fortran-like input a**b.
Those abnormal MathML code is being generated by tools are listed in the
w3c official site for MathML.

But situation is still poor!!

A <basis>basis</basis><sup>sup</sup> model is easily extensible

the MathML two-arguments model <msup>basis sup</msup> is not.

Therefore, if you want to introduce super and subindex, the above model
does not work because the MathML “version” of TeX a_b^c

<msup><msub>basis sub</msub>sup</msup>

IS DIFERENT. The basis for the superscript in above MathML is incorrectly
encoded. and MathML folks saw obligated to introduce a new tag and a new
parsing model (now with three arguments)

<msubsup> basis sub sup</msubsup>

Now basis is correctly encoded but at the prize of a new tag -together the
<sup> and <sub> ones-, a new three-arguments model for the MathML DTD, 
and more complexity for both MathML browsers and tools.

If you want encode some other kind of sub or superscripts for example
prescripts one, then above model is not good again and MathML needs
introduce new <multiscript>, <prescript/> and <none/> tags and a new
processing model. More complexity.

The same for over and under in MathML. You need a <mover> , a <munder> and
after a <munderover> with three-arguments.

But all of above is not sufficient and things such as combined under and
sub-superscripts is not contemplated in the specification, or under, and
over with prescripts, etc. Then in future specifications we would need a
soap of new tags for the different combinations

<munderoversup> with four arguments

<munderoversubsup> with five

<mundermultiscript> with an arbitrary number

and other combinations you can imagine, which may imply an unusual
complexity for computers. In SGML math, the model for scripts is more
powerful being more simple, just four basic tags for under, over, sub, and
sup are combined with <subform> in different way to encode any
"imaginable" multiscript mathematics including prescripts.

MathML does poor, addding more complexity with 7 different tags and
processing models cannot deal with /all alternatives needed/.

> Granted, something that doesn't look like XML would be much simpler
> to write -- at least for simple structures.
>

[snip]


Juan R.

Center for CANONICAL |SCIENCE)

Received on Tuesday, 4 April 2006 14:10:24 UTC