Re: Technical reasons for some options taken on design of MathML from juanrgonzaleza@canonicalscience.com on 2006-04-05 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Wed, 5 Apr 2006 03:26:45 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3256.217.124.69.215.1144232805.squirrel@webmail.canonicalscience.com>
David Carlisle wrote:
>
>> The basis for over and under scripts in your example is incorrect.
>
> I didn't reply to your over under question as I had no idea what you
> meant by the ascii art
> Over
>      sup
> Base
>      sub
> under
>

You could ask me. Do not worry about that, I know you are a busy man and I
can understand that in last three or four replies you did not find five
lines of text for asking me "what does this ascii art mean?".

It is really interesting that above ascii art I obtained from the ANSI/ISO
math standard of 1995 and nobody there asked about ambiguities. Since
there is not grouping of none kind it is easy understand that are four
different scripts acting on a unique base. It is more, the original
standard document uses subformula, but I carefully wrote *basis* for doing
more easy the reading in this list. Still _some_ people did not
understand.

> I would have guessed that you meant an over-under construction with base
> having a sub and superscript, in which case  the MathML that you
> said that you recieved is correct.

Maybe, but I wrote a basis with four "embellishments" and the MathML
received is not encoding that, you know.

> If you mean that you want the over and under to be positioned over the
> Base without being affected by the presence of the sub and sub then
> a) why? and b) what possible markup would you suggest that could do that
> (You can't do it in TeX either).

I do not know exactly about LaTeX, because I do not know how the base for

\sideset{}{_*^*}\symbol_*^*

is computed in the AMS extension (I suspect that all scripts are not
treated in an equal footing on the package but do not know for sure).

There are available markups for general script structures. They are there,
outside! Some of them as ISO 12083 are even standards anyone can consult
for obtaining good ideas.

It is also really interesting you deleted several proposals/examples of my
previous message for encoding some script structures, in an extension of
the MathML 2.0 ugly model for scripts.

Admirably, SGML folks developing the script model did not waste their time
in committees asking "why". They just presented a general and powerful
model for scripts for scientific and mathematical documents that MathML
folks ignored years after. Personally, I do not need to ask why, I need a
more general example

       Over
presup      sup
       Base
presub      sub
       under

at level of *elementary* chemistry textbooks. Above encoding is not
covered by the oversold MathML:

"Encode mathematical material suitable for teaching and scientific
communication at all levels."

>
>> The question is that MathML designers have done a couple of errors in the
>> specification. This is a clear example.
>
> It's not clear at all. As your example is just 5 words and some white
> space, we have to guess what mathematical relationships you mean before
anyone
> can suggest any markup.

Hum, I believe you have incorrectly computed the size of the "base" ;-)
here. My previous "This is a clear example" referred to the structure
cannot be encoded in MathML and also to the rest discussion about MathML
scripting model design. That "rest" you ignored contains more than just 5
words and some white space...

>>. In mathematical SGML,
> "mathematical SGML" is far more general than you mean. XML is a profile
> of SGML and as such MathML is a mathematical SGML dtd (and has been
> processed with core SGML tools such as nsgml and the jade dsssl engine).
> You presumably have some other DTD in mind, perhaps you should say how
> your example would be encoded in that dtd.

Oops! I know relation SGML/XML and i suspect you know perfectly I was
talking. In fact, in the own MathML specification 2.0 version the concept
of "mathematical SGML" I used in the natural way I was using:

"MathML has benefited from the participation of a number of working group
members involved in other mathematical encoding efforts in the SGML and
computer-algebra communities."

"Extensive work on encoding mathematics has also been done in the SGML
community, and SGML-based encoding schemes are widely used by commercial
publishers."

And nobody worried about... because all of us know it is being said there.

>
>> Those abnormal MathML code is being generated by tools are listed in the
>> w3c official site for MathML.
>
> and your point is?

A simple example is better than 1000 words:

---
<mi>a</mi><msup><mrow/><mi>b</mi></msup>
---

> Yesterday I was sent a message in german, I passed it through google's
> german to english translator. the "english" that resulted was a little
> "strange" but that doesn't mean that I should deduce that because
> structural differences between german and english make translation non
> trivial that there is an error in the design of either language.

Applause: Claps, Claps, Claps! Do i need really reply to this? I think
(maybe I am wrong :-) nobody has "designed" English to be so verbose
cannot enter it by hand and anyone in the planet need learn German as an
*input syntax*, you see difference?

>
>
>> <msup><msub>basis sub</msub>sup</msup>
>>
>> IS DIFERENT. The basis for the superscript in above MathML is incorrectly
>> encoded. and MathML folks saw obligated to introduce a new tag and a new
>> parsing model (now with three arguments)
>>
>> <msubsup> basis sub sup</msubsup>
>
> You need both of those concepts whatever the markup you use. Mathml
> encodes them the way you show, in TeX the first is {a_b}^c and the
> second is  a_b^c

Apparently you cannot understand. It is clear I was not critiquing the
existence of the two concepts, just the *ineffective MathML design* for
dealing with both. Take your TeX examples á la MathML

<msup><msub>a b</msub>c</msup>

<msubsup>a b c</msubsup>

and compare with

{a_b}^c

a_b^c

and with SGML (not MathML :-)

<subform>a<sub>b</sub></subform><sup>c</sup>

a<sub>b</sub><sup>c</sup>

Now substitute the subforms of SGML by curly brakets and the sub and sup
SGML by shorthands, what do you obtain?

{a_b_}^c^

a_b_^c^

yes!! something very close to HTML Math and very close to TeX. Moreover
due to similarity with TeX/LaTeX it is very easy transforming both. MathML
does not achieve that.

There is absolutely not need to introduce new collections of script tags
in SGML, whereas as I already explained in the past the ugly MathML design
needs to introduce 7 different tags for offering us _less_ power than
SGML.

What is more, different structures are encoded via different combinations
of same tags in SGML (the non MathML part :-) whereas you will be forced
to introduce news collections of tags in future extensions of MathML,
complicating the DTD and the whole processing model a lot of for covering
those structures can be encoded by SGML math (I refer to the non MathML
part :-) but, actually, you cannot encode with MathML 2.0.

>
>> If you want encode some other kind of sub or superscripts for example
>> prescripts one, then above model is not good again and MathML needs
>> introduce new <multiscript>, <prescript/> and <none/> tags and a new
>> processing model. More complexity.
>
> on the contrary multiscripts and prescripts are one place where mathml
> markup (and underlying layout model) is vastly superior to teX's.
> TeX has no model of prescripts at all, and none of multiscripts other
> than a single sub-super pair. To do pre-scripts or multi-scripts
> in TeX you have to do a lot of explict spacing. It's possible to write
> macros that try to do the spacing automatically but it's not easy. Have
> a look at the multiscript code in amslatex, it's horrendously
> complicated.
>

That is just free propaganda.

1) MathML is not vastly superior to TeX. Just the script/multiscript
layout encoding is better. The structural markup for math is better. I
already wrote about that in Canonical Science Today in the past

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

For instance, I already said you are saying now about the absence of
prescript model in TeX oriented system but you, apparently, have forgotten
it.

Once clarified that, one may add that TeX markup is vastly easier than
MathML. That is, MathML is better from a structural point of view;
TeX/LaTeX is better for authoring. Therein that the own MathML WG
encourages the use of TeX, TeX-like, or TeX dialects as IteX, as input
syntax for authoring. You can find it in the w3c official MathML Software
list.

2) I was referring to other SGML/HTML markups, where dealing with scripts
is better.

3) You have just ignored the main criticism to the MathML model with bases
inside the script markup.

In SGML one can encode very complex script structures with combinations of
5 basic tags. MathML uses 10 tags for providing us LESS structural power,
and if you want encode more general script structures in MathML, you will
need to add an impressive number of new tags in future specifications of
MathML.

Since you have failed to understand this, I will explain again.

In SGML 12083 you have tags for sup, sub, over, and under script and you
can combine them. The basis is outside as usual in almost any computer
model: Fortran, Tex, IteX, ASCIIMAth, Maple, Mathematica...

In MathML, you find the base inside the markup

<msub>base script</msub>

<msup>base script</msup>

then you *cannot* combine msup and msub for obtaining subsup because both
contain base in a redundant way (in SGML, TeX, IteX... however, you can do
that because the basis is encoded outside), then MathML folks are
obligated to introduce a new tag and a new three-parameters processing
model –complicating the DTD and browsers’ design-

<msubsup>base script1 script2</msubsup>

The same criticism applies to "under" and "over".

However, all that soap of tags in MathML is not sufficient and you need to
introduce another 3 tags for other scripting possibilities and, again, you
find that the whole specification 2.0 is not sufficient for encoding
arbitrary mathematical structures.

My previous example cannot be encoded in MathML and due to ineffective
MathML design you will be forced to introduce a new tag and a new content
model in some future MathML specification. In the same way it was needed
to invent the news tags "msubsup", "munderover", and "multiscript" with
the current MathML, you will need invent a new tag like <munderoversubsup>
with a five-argument content/processing model

<munderoversubsup>base script1 script2 script3 script4</munderoversubsup>

and so on!

Since MathML designers decided to encode the basis in the script markup
you need a new composed tag for each new mathematical structure was not
covered. Since MathML designers decided to encode the basis in the script
markup you need a new content/processing models for each new mathematical
structure was not covered. Just compare

<msub>base script</msub> --MathML 2.0

base<sub>script</sub> --SGML way

<msup>base script</msup> --MathML 2.0

base<sup>script</sup> --SGML way

<msubsup>base script1 script2</msubsup> --MathML 2.0

base<sub>script1</sub><sup>script2</sup> --SGML way

<munderoversubsup>base script1 script2 script3 script4</munderoversubsup>
--MathML 3.0?

base<under>script1</under><over>script2</over><sub>script3</sub><sup>script4</sup>
--SGML way

In SGML other combinations of tags are possible for encoding arbitray
complex script structures. Note as you can combine sub and sup in SGML but
you cannot combine msub and msup because the basis is inside and,
therefore, you are forced to develop a new (redundant) msubsup tag in
MathML.

And what about prescripts with over and about sub-sup with under? They are
not covered by MathML 2.0. Will we see a new soap of tags for MathML 3.0?

<multioverscript>?

<msubsupunder>?

any other combination?

Whole result is an inefficient/complex/redundant MathML specification
that, moreover, is _not_ unified with rest of XML world (e.g. XHTML).

>> In SGML math, the model for scripts is more
>> powerful being more simple, just four basic tags for under, over, sub, and
>> sup are combined with <subform>
> Ah so by SGML you mean ISO 12083
> So how would you mark up you first example in  ISO 12083 ?
>
> David
>

Combining the tags for under, over, sub, and sup scripts.

Note that the encoding of something as

 1 3
H
  2

is also simple and intuitive in SGML math, reusing the basic solid model.
However, MathML needs a new content/processing model and novel additional
tags because an incorrect design.

Above example is easily coded in SGML math using just the sub and sup
*available* tags. In MathML (due to many design errors) you cannot reuse
<msub> not <msup> not even reuse <msubsup>! You need introduce a new
content/processing model

<mmultiscripts>
base
     (
subscript superscript)*
     [ <mprescripts/> (
presubscript presuperscript)* ]
</mmultiscripts>

and three new tags: <mmultiscripts>, <mprescripts/>, and <none/>.

Moreover, the verbosity of MathML for above example is bigger because you
need to introduce redundant code due to pairs. The above example in MathML
needs of three additional <none/> tags.

<mmultiscripts>
H
<none/> 1
2 <none/>
<none/> 3
</mmultiscripts>

Nothing of this garbagge is needed in SGML or in other markup models.

Really, the situation on MathML is still poor because you need markup for
each token in MathML code, whereas that is unneeded in SGML math (this is
another design error of MathML).

The encoding of something like

ij kl
  H

Is also less verbose and more intuitive in SGML (due to bad design of
MathML). Other point to be remarked is that prescripts are introduced
_before_ superscripts in SGML but MathML folks claiming

"The prescripts are optional, and when present are given after the
postscripts, because prescripts are relatively rare compared to tensor
notation."

Oblige us to introduce prescripts after!!!

Are prescripts relatively rare? Guys, do you know what is elementary
physics or chemistry? You can find elementary chemistry textbooks with lot
of prescripts and zero tensors...

For introducing prescripts in MathML, you need another new tag,
<mprescripts/>. That is not needed in the rather solid SGML design.


******************

And we could talk days and days about many other limitations and errors of
the MathML design (and why MathML is far from popular). For example, the
next under-overline structure (i obtained from the ISO standard)

___________
U V W X Y Z
    -------

cannot be encoded in MathML 2.0 (W3C Recommendation 21 October 2003).

Yet, it can be easily encoded in the mathematical SGML markup (ISO
standard of 1995 containing reused parts from earlier 1988).

Whow!


Juan R.

Center for CANONICAL |SCIENCE)
Received on Wednesday, 5 April 2006 10:27:10 UTC