[whatwg] Mathematics in HTML5 from juanrgonzaleza@canonicalscience.com on 2006-06-05 (public-whatwg-archive@w3.org from June 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Mon, 5 Jun 2006 06:36:04 -0700 (PDT)
Message-ID: <3078.217.124.88.197.1149514564.squirrel@webmail.canonicalscience.com>
Ian Hickson wrote:
>
> On Fri, 2 Jun 2006, White Lynx wrote:
>>
>> To summarize discussion on mathematics in HTML5, I would like to ask
>> several questions. 1) Which markup do you think fits better in the
>> scope of HTML5?
>> 	a)
>> 		<div>
>> 		(X)HTML document may contain math formulae, like
>> 		<formula>
>> 		ax<sup>2</sup> + bx + c = 0
>> 		</formula>
>>		</div>
>
> This markup is completely inadequate to represent mathematics. For
> example, it doesn't say whether "ax" is one variable or two.

Markups of that kind are standards in academic publishing. They never
considered completely "inadequate" in the way you are claiming. In
Elsevier SGML DTD for mathematical articles above equation would be
written in a very close way to George proposal.

Look next example obtained from Elsevier technical documentation

G(&phi;) = 2&pi;r exp(i &psi;)

<f><rm><ssf>G</ssf></rm>(&phi;)=2&pi;r<hsp sp="0.2">
<rm>exp</rm>(<rm>i</rm>&psi;)</f>

The default mode is italic for formulae (<f>) and <rm> introduces roman
tokens. <hsp> introduces extra space and <ssf> introduces san serif
fonts. Subindices and superindices both are introduces in similar way to
HTML. For instance a^i b^j is encoded as

a<sup>i</sup>b<sup>j</sup>.

There are many possibilities; you can define a token mode (as TeX
Elsevier Math or XL-MAIDEN) or introduce spaces (as Mathematica) or
reusing <var> as was already noted here

<var>a</var><var>x</var>

> In HTML5 we have other options, too. For example, we could define a
> special parsing mode

Interesting approach but I believe that unnecessarily complicates the
design of the spec and the implementation in browsers because would
obligate to parse data in three different incompatible ways: HTML Math,
HTML/XML, MathML.

>
>    <math>
>      <mrow>a &#x2062; <msup>x 2</msup></mrow> +
>        <mrow>b &#x2062; x</mrow> + c = 0
>    </math>

Why do you introduce <mrow> instead reusing <span>? It cannot be
confounded because is child of a <math> element: i.e. math mode.

Probably you do not know but in April 2006 Robert Miner -from MathML IG-
asked in w3c mailing list what would be changed in future MathML for
doing it CSS friendly. Many changes to the current 2.0 specification
were proposed. I do not understand why now we would reuse the same
MathML specification is causing so many headaches to both developers and
authors.

We can learn from errors and try to do it better; I am especially
interested in browser compatibility. Officially both Mozilla Foundation
and Opera Software also are interested in backward compatibility with
CSS, DOM, and HTML, as explained in their position paper (linked at the
bottom of http://www.whatwg.org/). Therefore, I do not understand why
the manifesto emphasizes CSS, HTML, DOM compatibility whereas you
propose w3c code violating the three.

For instance, you are claiming for the reuse of the <msup> element, let
me summarize main difficulties with MathML script model (largely debated
this year at the MathML mailing list):

1) The MathML model is not directly extensible because basis
interference.

2) The MathML model introduces a different content model for each
different script structure.

3) The MathML model, whereas being more complex (more content models and
more tags) than the old script model of ISO 12083 standard, encodes less
structures because tags cannot be combined.

4) The MathML model is not CSS friendly (some people sure me is not
XSL-FO friendly) and is not DOM friendly.

5) The MathML model is not backward compatible with extended encodings
people know very well and use such as HTML, ISO 12083, Mathematica,
Maple, TeX, LaTeX, and others.

The point 5) is also related to difficulties to write good TeX -->
MathML translators.

a^b can be easily parsed to a<sup>b</sup> but not so easily parsed to
<msup>a b</msup>. In fact, many available parsers still offer wrong
results at this point after 10 years of the born of MathML!!

> ...with the DOM being the full MathML representation (namespaces, DOM,
> and  everything),

also compatible with MathML weakness or is there room for improvement?

> compatibility with an existing language,

this would read ?compatibility with a ugly language is incompatible with
CSS+XML+HTML+SGML+ISO12083 and is being largely rejected by both authors
and developers even after 10 years of promises.

The first mathematical language developed by w3c was HTML-Math in the
draft of HTML 3. It was so full of errors and incongruences that was
completely rejected by community. Would we copy w3c HTML-Math? not true?

Next attempt was the MathML 1.0 also with lot of errors (fortunately
corrected in next 2.0 version). Current MathML 2.0 contains several
flaws still (specially in the presentational code); that would we do try
to develop a new language more concise, powerful, and browser compatible
or copy an unfortunate design?

Luca Padovani in his 2003 PhD in mathematical formatting studied
rendering of a simple matrix (2 x 2) equation and wrote

<blockquote>
By the MathML stretchying rules of operators, which were briefly
summarized on page 23 [...] depending on the vertical extent of the
sub-expressions x_ij , y_i, and z_i the parentheses may be stretched to
different sizes, and the nice-looking outcome of rendering equation 1.1
is just a fortunate fact.

A quick analysis of the MathML markup reveals that there is no way to
preserve the structure of the formula and still have a "correct"
rendering at the same time.
</blockquote>


> its renderers,

That is, we recover all difficulties for rendering math in both on and
off-line systems, including failures to implement MathML code in FO
renders.

Using *current* CSS rendering we can display lot of math in almost
current browsers, without special fonts or plugins (this could be
improved with best support for CSS or with future specific CSS
enhancements).

If we choose MathML, we can render _some_ math in Firefox and friends,
and in MSIE when using a third party plugin (which is far from perfect).
Interesting perspective!

> and
> its content, unambiguous interpretation,

Curiously, last months we discussed many examples of ambiguous MathML
code (extracted from real sites) in MathML mailing list. For example,
what do you mean by this <mi>d</mi><mi>x</mi> or this
<mo>d</mo><mi>x</mi>?

> Currently this thread seems mostly to be
> along  the lines of "we should add maths, but we shouldn't make it
> hard".

I think that main idea is "we should add maths in a compatible way with
the rest of satisfactory technologies available (i.e. without unneeded
breaks), whereas we would not make it as unnecessarily hard as MathML
does.



White Lynx wrote:
>
> Thus price that browser developers have to pay for fractions is very
> close to zero, so why not to make some  mathematicians happy and
> include fractions in HTML5? The same applies to nearly each and every
> mathematical expression, so it is funny to have opportunity and not to
> use it just because seven years ago someone at W3C decided to
> "reinvent wheel, make it square and put the horse behind the cart".

Good point! The w3c has been rudely critiqued by several of
specifications developed. MathML is in the top five. Robert Miner (w3c
MathML IG) was obligated to recognize that

<blockquote>
However, as I have observed again and again during the decade I've
devoted myself to the issues of electronic mathematical communication,
the principle challenges are not technical, but political. MathML is not
the way it is exclusively because of language design considerations --
it is the way it is because it was the politically feasible compromise
between the many conflicting interests of the consortium members that
had a stake is standardizing a markup for math notation.
</blockquote>


Juan R.

Center for CANONICAL |SCIENCE)
Received on Monday, 5 June 2006 06:36:04 UTC