Re: Technical reasons for some options taken on design of MathML from juanrgonzaleza@canonicalscience.com on 2006-04-07 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Fri, 7 Apr 2006 08:07:26 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3891.217.124.69.246.1144422446.squirrel@webmail.canonicalscience.com>
David Carlisle wrote:
>
>
>> Maybe, but I wrote a basis with four "embellishments" and the MathML
>> received is not encoding that, you know.
>
> This is true, now you've explained what you mean but I don't think that
> is too serious a limitation.

Maybe this is the whole problem with the MathML WG. They believe that
limitations of their work are not too serious and they also believe that
design they provided is good enough. Maybe the problem of that they are
perplexed of why MathML is not being very popular (note the astonishing
lack of browsers support) is that the WG is not aware of limitations and
error designs...

Moreover, the MathML WG promised mathematical support for almost all of
mathematical/scientific needs, but even elementary school math is very
difficult to deal with.

As I already said, the above kind of script structure (which is correctly
treated in SGML math but cannot be encoded in MathML) are already useful
in *elementary* chemistry textbooks. The SGML WG did a better work there.

> If you are concerned about the layout
> rules for such a construct, then you need to specify whether if the
> over-script and super-script would collide, which one moves. In
> Presentation MathML you would control that by nesting the munderover
> inside the msubsup or the other way round. If you are interested in the
> mathematical meaning that this is a single base with four embellishments
> then it is possibly a function applied to 5 arguments that just happen
> to be laid out in that way so you could supply content mathml markup
> that reflects that along with either a stylesheet or parallel
> presentation markup that gives the layout.

No! the structures

<msubsup><munderover>

and

<munderover><msubsup>

you are claiming and the hypotetical (MathML 3.0?)

<munderoversubsup>

are all _different_ (SGML folks knew that years ago!). It is very amazing
to see to Michael Kohlhase (one of authors of MathML 2.0) explaining why
the structures

<msup>
<msub>
<mi>x</mi>
<mn>1</mn>
</msub>
<mi>&alpha;</mi>
</msup>

and

<msubsup>
<mi>x</mi>
<mn>1</mn>
<mi>&alpha;</mi>
</msubsup>

are different, whereas you suggest encode the structure I wrote nesting
available MathML 2.0 structures. Would not you suggest also to Michael
Kohlhase encode the last structure of above just via nesting <msup> and
<msub>?

But I know the difference between both markups, Michael Kohlhase knows the
difference, and SGML authors know the difference, therein later guys
designed markup is able to code mathematical structure I wrote here in the
past and that *cannot* be encoded in MathML due to error design.

>> I do not know exactly about LaTeX, because I do not know how the base for
>>
>> \sideset{}{_*^*}\symbol_*^*
>
> Yes exactly what I was refering to. TeX's layout model has no support
> for prescripts. It is of course possible to write macros such as
> \sidescript that measure the base and attempt to position the prescripts
> by hand but it's not easy.

I already said and repeated that TeX lacks adequate support for
prescripts. I said that here several times and I said in Canonical Science
Today in February!!! What is the objective of repeating I already said?

> But this is in the user-level macro layer above the TeX layout engine so
> is more or less equivalent to saying that MathML has the same feature
> because you can write <any>markup</any><you>want</you> and transform it
> to presentation mathml via XSLT. When you compare the two cases it is in
> principle easier in MathML as there is underlying support for prescripts
> in the layout model. Of course a typical TeX user doesn't necessarily
> notice the difficulty as the amslatex macros already exist.
>
>
>> That is just free propaganda.
>>
>> 1) MathML is not vastly superior to TeX.
>
> Er that isn't what I said. I said that its support for pre and
> multiscripts is vastly superior  which is clearly true as TeX has no
> support for pre or multi scripts. I'm hardly likely to engage in
> propaganda against latex am I? (In case you were not aware, have a look at
> the list of authors of the latex system).
> ftp://tug.ctan.org/pub/tex-archive/macros/latex/base/legal.txt
>

You would not take just that part for quoting; you would take the entire
1) point and the rest of points I wrote. Taking just that part you force a
misunderstanding by our readers here and that is not good...

I wrote just after of above partial quote:

Just the script/multiscript layout encoding is better. The structural
markup for math is better. I already wrote about that in Canonical Science
Today in the past

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

For instance, I already said you are saying now about the absence of
prescript model in TeX oriented system but you, apparently, have forgotten
it.

Once clarified that, one may add that TeX markup is vastly easier than
MathML. That is, MathML is better from a structural point of view;
TeX/LaTeX is better for authoring. Therein that the own MathML WG
encourages the use of TeX, TeX-like, or TeX dialects as IteX, as input
syntax for authoring. You can find it in the w3c official MathML Software
list.

2) I was referring to other SGML/HTML markups, where dealing with scripts
is better.

3) You have just ignored the main criticism to the MathML model with bases
inside the script markup.

[END QUOTE]

Your recent "I said that its support for pre and multiscripts is vastly
superior  which is clearly true as TeX has no support for pre or multi
scripts."

May be explained. I repeat again that just *structural model* for pre and
multiscripts is better in MathML (but it is much better in other markup
designs than in MathML 2.0). I repeat again that the *input syntax* in
MathML is not better and the reason people is using other syntaxes.

You also said
> I'm hardly likely to engage in
> propaganda against latex am I?

I just said

>> That is just free propaganda.

And next (all the part of my message you have not cited now) I explained I
was speaking about propaganda pro MathML 2.0 (you are author too).
Moreover, to say the TeX/LaTeX treatment of prescript is based in tricks
cannot be named "propaganda against latex", since it is just reflection of
reality.

The SGML Math group also recognizes that the TeX model for simulating
prescripts is unsustainable in a good design of mathematical markup. That
is not "propaganda against", it is just reality. When Wolfram claims that
TeX/LaTeX syntax is not suitable for computational mathematics he is not
doing "propaganda against" just saying how TeX is.

>> In SGML 12083 you have tags for sup, sub, over, and under script and you
>> can combine them. The basis is outside as usual in almost any computer
>> model: Fortran, Tex, IteX, ASCIIMAth, Maple, Mathematica...
>
> Any system has advantages and disadvantages.

And the point apparently ignored by MathML WG is that progress means to
offer us new models with, at least, same advantages but less disadvantages
than previous models.

MathML 2.0 specification has corrected some disadvantages of previous
systems but at the cost of introducing in arena a couple of new
disadvantages were not present in previous systems. The net result being a
really complex redundant model cannot be used in many practical problems.
Error designs of MathML 2.0 are the basis for a low implementation in
browsers. Limitations of MathML 2.0 are the basis that people of real
world who want to solve problems of real world is designing own markup
systems.

It is really interesting I know nobody is using MathML for all of science.
Something as simple as T = 300 K, which is expressed in mathematical
language is being encoded in scientific oriented markups as <item name="T"
units="K" >300</item> or similar. The soap <ca><ce><ci><co><co> of Content
MathML 2.0 being really ignored (FAPP). It is interesting that
mathematical equation T = 300 K is being not encoded via Content MathML
2.0 in all scientific cases I know.

The use of MathML 2.0 appears to be either null or minimal.

> MathML is one of the few
> I've seen with good support for multiple superscripts for example.

There exist markup models _previous_ to MathML (1 and 2.0 versions) with a
still better support for multiple scripts. Therefore, did not reuse them
was a design mistake.

> These are not supported by TeX
> $ tex \\relax "\$a^1^2\$"
> ! Double superscript.
> <*> \relax $a^1^
>                 2$
> ?
>
> or maple
>> a^2^3;
> on line 4, syntax error, `^` unexpected:
>

I am just curious, do you know that you are encoding in above Maple example?

> In both of those systems, to use the shorthand ^ syntax you can't put
> two superscripts on the same base. Of course in maple you could define a
> function of three arguments (a,2,3), but that is exactly analogous to
> the mathml example I gave earlier, you can encode the mathematical
> meaning using content mathml and the layout using presentation.
>

Which is a complete misunderstanding by MathML folks of what mean the
concept "splitting presentation from content". I wrote a detailed
criticism about that in

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

and why the syntax for CanonMath could be used as input syntax for both
presentation and Content MathML.

Moreover, the MathML-presentation design violates basic philosophy of
others w3c technologies. You may be familiar with that because precisely
members of the WG for MathML were very skeptic of the presentation MathML
model. Unfortunately the rest of the committee ignored those members as
also ignored to others w3c folks...

> They are supported by ISO 12083 but of course that's just an input
> syntax, as far as I'm aware it's often transformed to something like
> TeX (or in house proprietary typesetting codes) for printing so it
> depends on how accurately any such translation captures complicated
> arrangements of multiple scripts. (Maybe it works out in practice, i
> don't know...)
>

ISO 12083 is not an input syntax!!!! The _structural_ encoding of
mathematics -more than 10 years ago- was better than with more recent
presentation-MathML specification.

Moreover, you may be ignoring that due to error designs of MathML, people
of real world obtain many difficulties for printing MathML with a minimum
of quality. It is precisely MathML that is being translated to TeX/LaTeX
before and next printing using the "old" typesetting TeX engine...

It is interesting that almost all of academic publishers are ignoring
MathML promises and using other alternatives (at my current knowledge only
_Blackwell_ publisher is using MathML). For instance, the renowned
_Nature_ is working with ISO 12083.

>
>> For instance, I already said you are saying now about the absence of
>> prescript model in TeX oriented system but you, apparently, have forgotten
>> it.
>
> No. I hadn't forgotten it I refered to that in my original reply!
> there is a big difference with a user-level macro and support in the
> basic system.

Ok, then you just repeated I said about absence of prescript model in TeX
as if you had forgotten I already noticed that (and others) limitation of
TeX.

>> My previous example cannot be encoded in MathML
>
> see above.
>

See reply.

>> base<sup>script</sup> --SGML way
> Where you mean ISO 12083 you really should say so, it is just confusing
> to use SGML to mean one particular DTD.
>

No, it is your confusion! When I wrote "SGML way" I did just mean "SGML
way", because I was not referring only to the ISO 12083. In fact, some
examples I wrote in past communications were not obtained from the ISO
12083.

In fact the code base<sup>script</sup> is present in ISO 12083 but is also
in others alternatives (e.g. Elsevier scientific DTD), including recent
XML formats inspired by specific SGML DTDs, therein my emphasis in
labeling it as "SGML way".

>
>> That is, MathML is better from a structural point of view;
>> TeX/LaTeX is better for authoring.
>
> If writing (as I do) in a text editor rather than some GUI that is
> certainly true.
>

It is also true even when comparing with certain GUI-oriented MathML tools.

>> Therein that the own MathML WG encourages the use of TeX, TeX-like, or
>> TeX dialects as IteX, as input  syntax for authoring. You can find it
>> in the w3c official MathML Software list.
>
> You've refered to that list several times, but you should note that that
> list is not a list of software endorsed or recommended by the W3C. If
> you have software that supports MathML you just need to write in and ask
> that it be listed. So long as you can point to some public documentation
> or announcement that confirms that it does have MathMl support it will
> be listed. It does not imply that the software is recommended by anyone,
> the list is provided as a service to the community to help people find
> software.
>

Thanks by that large and redundant explanation. If I understand you
correctly, you are explaining the difference between something like

"You can find it in the w3c officially endorsed or recommended MathML
Software list."

And that I just wrote:

"You can find it in the w3c official MathML Software list."

It would be nice to cite the official MathML FAQ with the aim of providing
complementary information:

"MathML is verbose - I edit HTML by hand with emacs, and MathML looks
tedious to read and edit in text form.

True! The verbosity of MathML is largely a consequence of the WG decision
to use XML as the base syntax. The reasons for doing this include
standardisation, availability of tools and the general tendency of
web-based applications to use XML as a carrier format. The downside is
that the encodings become verbose for examples of any complexity, with
resulting requirements on tools for MathML generation and manipulation.
See MathML Tools, Products and Content"

That is, the MathML group recognizes that unusual verbosity and adds a
link to a list containing GUI oriented tools and input syntaxes -such as
TeX or ASCIIMath, for instance- for authoring MathML.

>
>> And we could talk days and days about many other limitations and errors of
>> the MathML design (and why MathML is far from popular). For example, the
>> next under-overline structure (i obtained from the ISO standard)
>
> MathML is of course not perfect, nothing is (and incidentally I wasn't
> on the first WG that designed MathML 1.0 where all the script markup
> comes from) But I do believe that experience shows that putting the base
> as a child of the script elements has proved to be a good design.

Curiously people, with experience in generation of code dealing with
different markup systems, says just the contrary. I have got some
experience thanks to the CanonMath program and the introduction of basis
into markup content model adds none advantage over the standard
basis-outside model.

It is more, it easily proven that the ugly MathML model adds difficulties
are not present in other markup models for mathematics.

Moreover, I will list examples of MathML coding obtained from the real
world (codes generated by tools listed at the w3c). You will see how that
incorrect output is related to MathML design errors.

Precisely, it is practice (more than technical analysis of MathML DTD)
which proves the "fiasco" of MathML in browsers, users, publishers, and
others "...ers".

Real world difficulties obligated to me to launch CanonMath. It was also
real world, which let me say now that some script structures I need even
at level of elementary chemistry textbooks cannot be encoded in MathML but
can be encoded in ten years ago models (e.g. ISO 12083).

> As for popular, it's clear that MathML has been vastly more popular than
> any previous SGML or XML markup for mathematics.  ISO 12083 and other
> SGML DTD for mathematics (eg Elsevier's) were pretty much only used by
> large publishing houses. many of them are switching or thinking of
> switching to MathML and MathML is used in so many more contexts.
> If it's so clear that ISO 12083 is superior, why was it not picked up to
> be used in Computer algebra systems (mathematica and maple both support
> mathml) or word processors (Word+MathPlayer,  OpenOffice, AbiWord and
> SciWriter for example all support MathML) Plus of course support in web
> browsers.
>

Several remarks to be done here:

1) Also string theory is very popular but in practice, string theory
predicts/explains nothing after of 40 years of popularity and people is
returning to old alternatives would newer be abandoned. MathML is "very
popular" because average users know little about real limitations of the
markup and because it has been oversold to users thanks to w3c "intensive
marketing techniques".

Phlogiston theory was also very popular in their time. Today, it is just
of interest for historians...

2) Publishers are not massively changing to the new MathML.

3) I *never* said that the whole ISO 12083 was superior in a net sense,
otherwise I would be using it, which is, obviously, not the case. That I
said -and again you fail to appreciate- is that with a standard published
10 years ago I am able to encode mathematical structures cannot be encoded
using the modern MathML 2.0 due to error designs of latter.

4) That ISO 12083 -a presentation oriented markup- cannot be used for
content math is a trivial fact. The reason it is not used as *input
syntax* by Mathematica is _partially_ was explained by the own Wolfram.
Moreover, there is probably technical details. The question is that the
good of previous ISO 12083 model could be copied instead of poorly replied
by MathML specification.

There are three points more to be said. In the first place, that the
failure of MathML to encode some structures can be encoded by ISO 12083
continue to be a fact. In the second place, that Mathematica provides not
support for others things and that does not mean that were inferior. For
instance Mathematica offers no-support for OpenMath. Third, members of
Wolfram Research, Inc. were included in MathML WG but not in ISO 12083 WG,
therefore...

5) I have used tools you are citing for authoring of MathML and I
abandoned them by a series of motives: difficulties, lack of support for
advanced needs, etc.

I do not know now, but one of difficulties with OpenOffice i obtained was
the support of MathML 1.0 (not 2.0). Sciwritter offered to me some
incorrect outputs for certain canonical science formulae; others
additional points obligated to me to ignore Sciwritter, Formulator, etc. I
worked with MathCast in current version of canonicalscience website, but I
am abandoning it due to limitations. I worked also with ASCIIMathjs, with
TeX conversors, and, of course, with Itex-to-MathML, which is a very, very
deficient tool generating completely redundant and/or wrong code...

6) Instead of reusing the good ideas one can read in ISO 12083 (and in
other systems) and add a general, elegant, and powerful markup system, the
w3c has offered us a new system does many things poor that previous
systems!

7) I have read many times the common mantra that ISO 12083 is presentation
oriented and, therefore, could not encode meaning. That is partially true,
but then a more practical way would be the generation of a kind of "ISO
12083 content markup" or the improvement of known ISO 12083 instead of
offering us new limited model. It is really very interesting heard
arguments that ISO 12083 was presentational only whereas offered a
poor-quality copy called "MathML 2.0 presentation".

8) Moreover, the design of content MathML is also very debatable. I
already wrote a bit about that in the past.

Recently, I obtained a publication from Andreas Strotmann. Yes, the same
guy acknowledged by MathML WG: "In particular, MathML has been influenced
by the OpenMath project [...] The working group has benefited from the
help of many people. We would like to particularly name [...] Andreas
Strotmann, and other contributors to the www-math mailing list for their
careful proofreading and constructive criticisms."

Strotmann explains why something so simple like "integral of sin(x) on x
from 0 to x" cannot be correctly encoded in content MathML 2.0 but can be
encoded in OpenMath due to a better design of the latter format. The
explanation offered of why content MathML encoding is *wrong* (it uses
that hard word) is rather technical. In another part of the document one
finds "serious language flaws" when refer to content MathML and other
markup languages.

9) After of so many attempts by the w3c to offer mathematical support for
the web, the result is deception. Your "Plus of course support in web
browsers" would be read as "Plus of course inefficient, partial, minimal
support in web browsers." Some developers just rejected support for MathML
2.0 because technical issues with THAT specification you are offering us.

For example, Mozilla Firefox offers native support for MathML but:

a) Introduces an external module rather than a true native support due to
incorrect design of MathML specification.

b) Just supports a part of presentation markup. The engine does not pass
several elementary presentation tests extracted from the official MathML
test suite. Content-MathML markup is just ignored.

c) Due to external MathML module built-in and some other technicalities,
one needs wait minutes before some complex MathML can be displayed in my
browser.

d) Thanks to his XML-MAIDEN design, George is able to render mathematical
content with a normal standard Opera browser (and zero plugins, zero
additional fonts, etc.) I am unable to render with my specialized Firefox
Mozilla.

That is impressive; especially when one notes that George is using
standard technology was not specially designed for that task!! Whereas my
MathML oriented specialized browser is unable!!!

> We are currently looking for requirements for improvements to MathML for
> a possible MathML3, but clearly MathML is not going to make a backward
> incompatible change to its script markup.

Why not? W3c has good experience in the design of backward incompatible
specifications ;-) The most recent example I know is future XHTML 2.0. It
is not backward compatible with XHTML 1.0/1.1 by explicit decision of the
WG.

Sincerely, I see not the need of a MathML 3 when current MathML 2.0 is
being mainly ignored. The MathML WG would focus on the implementation of
current MathML 2.0.

> So I'm not sure what you are
> aiming to achieve.

It is specially simple to understand: to solve each-day problems in the
Center for CANONICAL |SCIENCE), problems are not covered by XHTML+MathML.

> I suspect that what you want to do is design your own
> XML DTD and then have stylesheets that translate this to MathML for
> public use. That way you can have convenient short forms for constructs
> that you use often, and can make different design choices in element
> markup, according to taste. There is nothing wrong with that (It's what
> we do here at NAG for example, where documents are authored to a private
> DTD but converted to XHTML+MathML (and pdf) for publication.
>
> Speaking of personal taste in design of XML markup, in an earlier
> message you mentioned
>
>
>>  I would prefer DTDs for something like
>>
>>  <h1>This is my favourite heading</h1>
>>
>>  <p>This is a paragraph</p>
>>
>>  over
>>
>>  <appply><h1/><cnt>This is my favourite heading</cnt></appply>
>>
>>  <appply><p/><cnt>This is a paragraph</cnt></appply>
>
>
> Most commentators that I have seen have suggested that the heading
> markup is one of the more problematic areas of the HTMl design.
> Most document oriented DTDs (DocBook, TEI, etc) have a structure more like
> <section>
>   <head>This is my favourite heading</head>
>    <p>....</p>
> </section>
>
> this makes it easy to have outlining support in editors, or to extract
> the second section in Xpath etc
> select="section[2]"
> being rather easier
> than
> select="h2[2]|h2[2]/following-sibling::node()[count(following-sibling::h2)=count(current()/h2[2]/following-sibling::h2)"
>
>
> this is finally being addressed in XHTML 2
> http://www.w3.org/TR/xhtml2/mod-structural.html#sec_8.8.
> <section>
>   <h>This is my favourite heading</h>
>    <p>....</p>
> </section>
>
> Note that this XHTML2 markup, like Docbook's, is structually identical to
> the Content MathML <apply> markup.....

CanonTexT is better than future XHTML 2.0 or than current Docbook also
from a structural point of view. I already said months ago:

“XHTML (including the future XHTML 2.0) and MathML or specific languages
as Docbook do not fit all our requirements -for example, we need specific
scientific requirements for <chemistry> are not fulfilled even by the
specialized CML-, therein the need for the CanonML language.”

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

Precisely the flaws in previous HTML heading design are the basis that new
XHTML 2.0 is not backward compatible by design ;-)

There exist not “structural identity” between XHTML2 and Content MathML.
The XHTML2 model

<section>
<head>This is my favorite heading</head>
<p>....</p>
</section>

would be translated to Content MathML markup like

<apply><section/>
<apply><head/>
<tk>This</tk>
<tk>is</tk>
<tk>my</tk>
<tk>favorite</tk>
<tk>heading</tk>
</apply>
<apply><p/>
<tk>....</tk>
</apply>
</apply>

or maybe

<apply>
<apply><head/>
<tk>This</tk>
<tk>is</tk>
<tk>my</tk>
<tk>favorite</tk>
<tk>heading</tk>
</apply>
<apply><p/>
<tk>....</tk>
</apply>
</apply>

If one introduces semantics of <section/> on the <apply> construct. I
personally consider the option with the explicit <section/> "operator"
more close to Content MathML 2.0 design model but that is, of course,
debatable.


> David
>

In an informal way I would add that in a previous communication you said

"[...] in which case the MathML that you said that you recieved [...]"

Now that Mikko Rantalainen has recognized in public

[http://lists.w3.org/Archives/Public/www-math/2006Apr/0009.html]

that he sent the markup. Above phrase would transform to

"[...] in which case the MathML that you recieved [...]"



Juan R.

Center for CANONICAL |SCIENCE)
Received on Friday, 7 April 2006 15:07:42 UTC