Re: Technical reasons for some options taken on design of MathML from juanrgonzaleza@canonicalscience.com on 2006-04-14 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Fri, 14 Apr 2006 05:12:47 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3197.217.124.69.227.1145016767.squirrel@webmail.canonicalscience.com>
Bruce Miller wrote:

[snip]

> As you know, MathML also has a goal of representing the "meaning"
> of math, or at the least it's structure, beyond mere presentation.
> You might reasonably debate whether it _should_ have that goal,
> or whether it meets it, but it's there. Thus, for example,
> encapsulating the base of sub & superscripts is important;
> a simple <sub> tag doesn't do this.

Bases in ISO 12083 can be well encapsulated

<subform>Base</subform><sup>script</sup>

> As you also know, I share your frustration that MathML can't be
> presented solely by CSS.  I wasn't part of the group at
> the time, but the first MathML recommendation came only
> a year (early 1998) after the first CSS recommendation (late 1996).
> I doubt that it could have been forseen at that time that CSS
> would become the basis for rendering engines, rather than
> icing on top of it.  If it had, might MathML have come out
> differently? Perhaps...  Maybe CSS would have, too.

>From the beginning some members only wanted “content” MathML. Some
rendering engine/language would be just needed, and CSS was ready before.

> In any case, the languages of MathML & CSS don't match so
> well.  There are some `accidental' choices made in MathML
> that are hard to address with CSS's Selectors.  On the other
> hand, while CSS's box model comes surprisingly close, it
> really isn't yet up to the task of Math either (Math in general,
> not specifically MathML).
>
> So, where do we go from here?  Redesign MathML from the ground
> up?  Redesign CSS from the ground up?  Lobby for enhancements,
> clarifications and even deprecations in both MathML and CSS?

If number of users of both technologies does sense, then I believe may be
more logical to adapt MathML to CSS than inverse.

Robert Miner wrote:

[snip]
>
> As for the "math community" embracing any XML/SGML standard, I think you
> are overly optimistic there too. I think the large majority of members
> in the "math community" don't have a problem with authoring.  Among
> researchers, they spent years learning TeX, TeX is a nearly optimal
> authoring solution for research articles, the TeX esthetic is deeply
> embeded in the scholarly psyche, and PDF is an easy, pervasive way of
> sharing output.  Not to mention, it's all free.  In short, they don't
> feel like they have a problem.  Among non-TeX users, the situation is
> similar.  They mostly use Word with the free Equation Editor or
> soemtimes MathType if they are hardcore.  They learned how to use it
> years ago, all their colleagues use it, and it's free to them.  They
> don't feel like they have a problem either.  There are certainly authors
> who aren't in these categories, but they are a small minority.

There is absence of information on both TeX and Word-like communities. How
many users of Word believe that headings of a document are typed selecting
a normal piece of text it and next formatting it as centered, bold, and
16pt size font?

[snip]


White Lynx wrote:

[snip]

> Consider for
> instance:
> <p>
> This is paragraph that contains famous formula
> <formula>
> E = mc<sup>2</sup>
> </formula>
> </p>
> It fits much better in the general scope of HTML document then
> <p>
> This is paragraph that contains famous formula
> <math mode="display" xmlns="http://www.w3.org/1998/Math/MathML">
> <mi>E</mi><mo>=</mo><mrow><mi>m</mi><msup><mi>c</mi><mn>2</mn></msup></mrow>
> </math>
> </p>
> Judge yourself, which one would have larger user community?
> Simple, easy to learn and easy to use ISO stanadard or bloated,
> contraversial W3C recommendation.

A more correct representation of above simplistic formula (E=mc^2) in
presentation MathML 2.0 is still more verbose than you wrote. The
expression would be

<math display=“block” xmlns="http://www.w3.org/1998/Math/MathML">
<mrow><mi>E</mi><mo>=</mo><mrow><mi>m</mi><mo>&InvisibleTimes;</mo>
<msup><mi>c</mi><mn>2</mn></msup></mrow></mrow>
</math>

And if we want to add mathematical content then the formula is still more
complex, one can wait something like

<math display=“block” xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow><mi>E</mi><mo>=</mo><mrow><mi>m</mi><mo>&InvisibleTimes;</mo>
<msup><mi>c</mi><mn>2</mn></msup></mrow></mrow>
<annotation-xml encoding=“MathML-Content”>
<apply><eq/><ci>E</ci><apply><times/><ci>m</ci>
<apply><power/><ci>c</ci><cn>2</cn></apply></apply></apply>
</annotation-xml>
</semantics>
</math>

it is known as “top-level parallel MathML markup”. But if the application
receiving the encoding allows treatment of sub-expressions of mathematical
objects then top-level pairing is not sufficient. On that case, one waits
a more complex and verbose code still via “fine-grained parallel MathML
markup”!

<math display=“block” xmlns="http://www.w3.org/1998/Math/MathML">
<semantics><mrow><mi>E</mi><mo>=</mo><semantics>
<mrow><mi>m</mi><mo>&InvisibleTimes;</mo><semantics>
<msup><mi>c</mi><mn>2</mn></msup>
<annotation-xml encoding=“MathML-Content”>
<apply><power/><ci>c</ci><cn>2</cn></apply></annotation-xml></semantics>
</mrow><annotation-xml encoding=“MathML-Content”>
<apply><times/><ci>m</ci>
<apply><power/><ci>c</ci><cn>2</cn></apply></apply></annotation-xml>
</semantics></mrow><annotation-xml encoding=“MathML-Content”>
<apply><eq/><ci>E</ci><apply><times/><ci>m</ci>
<apply><power/><ci>c</ci><cn>2</cn></apply></apply></apply>
</annotation-xml></semantics>
</math>

Since this is so verbose, MathML folks designed an alternative coding via
id and xref attributes but, of course, the output is, in any case, more
verbose than in top-level parallel markup, and it is added the complexity
of working with attributes (the so critiqued double mode data encoding of
XML).

And all this terrible code just for the trivial (E=mc^2)! Could you
imagine the full MathML 2.0 code needed for complex equations used in
advanced scientific research? My god!

However, after of last ultra-verbose MathML 2.0 code, the computer just
begins to understand the mathematics behind the equation. Now computer
knows that 2 is a number, and E an identifier... but it has no idea of
physics. You can offer it negative masses or imaginary values for c and
all will be okay for the computer. You will need a XML code more complex
still for encoding the physics: E is energy, c is light velocity, m is
mass; mass cannot be negative, c is a real number with value given in
scientific tables, units, etc.

And still you will find rendering problems of the MathML formula and you
will need modify/extend the code more still. And you will be forced to
modify the code when preparing your manuscript for different journals
-maybe people at this list are unaware, but E=mc^2 does not follow last
chemical conventions-.

I will repeat the last code of above (MathML 2.0 ***without physics
semantics***)

<math display=“block” xmlns="http://www.w3.org/1998/Math/MathML">
<semantics>
<mrow>
<mi>E</mi><mo>=</mo>
<semantics>
<mrow>
<mi>m</mi><mo>&InvisibleTimes;</mo>
<semantics>
<msup><mi>c</mi><mn>2</mn></msup>
<annotation-xml encoding=“MathML-Content”>
<apply><power/>
<ci>c</ci><cn>2</cn>
</apply>
</annotation-xml>
</semantics>
</mrow>
<annotation-xml encoding=“MathML-Content”>
<apply><times/>
<ci>m</ci>
<apply><power/>
<ci>c</ci><cn>2</cn>
</apply>
</apply>
</annotation-xml>
</semantics>
</mrow>
<annotation-xml encoding=“MathML-Content”>
<apply><eq/>
<ci>E</ci>
<apply><times/>
<ci>m</ci>
<apply><power/>
<ci>c</ci><cn>2</cn>
</apply>
</apply>
</apply>
</annotation-xml>
</semantics>
</math>

for people can contrast this nightmare with White Lynx *with physics
meaning* proposal (I believe partially inspired in the ISO 12083 that
MathML WG ignored...)

<formula>
<group role="Energy">E</group> =
<group role="mass">m</group><group role="speed of light">c</group><sup
role="power">2</sup>
</formula>

Moreover, last XML formula is rendered in any old browser with minimal CSS
standard support, whereas the promoted MathML code may be rendered just by
last versions of Mozillas (soliciting you installing additional fonts) and
MSIE + special plug-ing.

Moreover, I will be forced to edit the ultra-verbose MathML each time I
prepare a different version of my manuscripts containing E=mc^2, since
MathML does not split content from presentation for scientific issues.

[snip]


Robert Miner wrote:
>
> Hi.
>
>> If someone could tell me where these millions of pages using MathML
> reside, it would simplify testing process a lot.
>>
>> I don't know where they all are, but there are several hundred
> documents here
>> http://www.nag.com/numeric/CL/CLdocumentation.asp
>
> Most of the MathML content I know of it not currently published on the
> Web.  It exists in backend production processes, which is why I
> clarified I really meant pages and not documents.  For example, I think
> four or five journals of the American Physical Society have been
> produced using MathML for the last 2 or 3 years.  As a very rough
> estimate, in one issue of one of those journals in a month, I see about
> 50 articles with a length of around 15 pages on average.  That would
> amount to something like 90,000 pages after 3 years.
>
> As David pointed out, the US Patent office has been churning out 1000
> equations a week in patent applications for 6 or 7 years.  If there are
> 10 equation per page on average (a wild guess, hopefully on the
> conservative side since I dimly remember from Karleen's talk he was only
> talking about display equations) that is 100 pages of MathML /wk for a
> total of something like another 30,000 pages. Similarly with the very
> substantial enterprise publishing operations run internally by companies
> like Airbus and others.
>
> I won't run through my whole back-of-the-envelope calculation that lead
> are millions of pages of MathML in existence, but the above examples
> ought to suffice to show that even if I'm wrong, I'm not wildly wrong.

To be useful, the real issue may be ratio (people uses MathML / people do
not).

[snip]


David Carlisle wrote:

[snip]

> The extensions from mathml used internally mainly relate to content
> mathml rather than presentation, as the set of empty elements designed
> for common "K-12 functions" doesn't really apply to the functions in our
> library, and it's just more convenient to use <apply><Ai/> than
> <apply><csymbol>Ai</csymbol> but this shorthand is easily expanded as
> part of the general transformation from our in-house DTD to
> XHTML+MathML.

Interesting!

> Converting our in-house documents from SGML-with-TeX-math-fragments to
> XML-with-MathML-math-fragments was of course a lot of work, but has
> shown a lot of benefit, the mathematics is far more consistently marked
> up now (TeX is so forgiving to authors:-) and the documents can be far
> more easily re-purposed. Mathematical expressions originally just
> intended for documentation are now used in code generation. Rather than
> just documenting constraints on some parameter, we can generate the code
> that checks the constraint. Note this is far easier in MathML where
> every operator is explictly tagged than in some suggested alternatives
> that make far more use of inline untagged text.

TeX goal is not encoding structure; therefore, one would not compare it
with MathML. Would be interesting to see if exists any benefit from
translating ISO 12083 or some other mathematical markup to MathML.

[snip]


Juan R.

Center for CANONICAL |SCIENCE)
Received on Friday, 14 April 2006 12:13:04 UTC