Re: MathML-in-HTML5

Roger B. Sidje wrote:
> There is a current discussion in the Mozilla MathML and Layout groups
> toward supporting MathML in plain HTML (as opposed to just XHTML). It is
> intended that this will happen in the framework of HTML5 being
> shepherded by the WHATWG (the Web Hypertext Application Technology
> Working Group). HTML5 is a down-to-earth alternative to XHTML in that it
>  alleviates the requirements of XML and thus is backward compatible
> with the whole wide web, and will therefore allow authors, among other
> things, to copy-paste <math>...</math> in their existing HTML documents,
> thereby greatly facilitating the implantation and adoption of MathML by
> individual users.

There was a basic discussion about mathematics at the WHATWG community.
Almost every guy on the list agreed that MathML was not the best option
(even Ian’s initial proposal was to support a special syntax via a new
parsing mode for avoiding MathML verbosity Nand redundancy).

I remember to many folks on the list claiming for _no_ MathML and
implementation of an alternative fitting in the HTML 5 framework and the
Mozilla Opera manifesto. I remember also some w3c folk and some people
akin to Opera claiming for an alternative was CSS, HTML, and DOM friendly.

After some technical discussion, it was broadly accepted a minimal subset
of original draft. Even a member from Mozilla community agreed on that the
proposal was cheap and could be implmented at the browser side without
real difficulty.

Then silent... and now Mozilla community and Ian launch this approach.
This sound really strange. Specially seeing that:

1) The proposal is contrary to original Mozilla manifesto

2) The proposal is contrary to web evolution and takes steps backward

3) All pleas at the WHATWG group ignored

4) At the one hand, Mozilla is implementing alternatives to W3C specs.
HTML 5 vs XHTML 2; canvas vs SVG, WGforms vs XForms, etc. Then the natural
way would be HTML-Math vs MathML with HTML-Math fitting into the rest of
the WathWG philosophy. Now it is now broken.

5) The proposal rejects previous draft because “was not standard” (no
exactly true since some parts were directly derived from ISO-12083) and
rejected because “was inventing a new syntax”, but current proposal is to
invent a new syntax and implementing not a standard.

> RBS - Mozilla/Firefox/Gecko's MathML project owner.

I will avoid the political part and will focus on I think are the sound
technical points and misunderstandings I heard from some people in both
formal and private ways.

Claim]
MathML is not popular because is blocked in a XHTML framework.

Reply]
MathML can be used in a pure simple XML framework and still lacks popularity.

MathML is not popular because does not solve many real-life needs. Take a
look to development and spreading of *alternatives* such as Elsevier Math
modification + add ons, XML-MAIDEN, OpenMath, CanonML, and now, the new
OMML language launched by Microsoft will probably predate p-MathML in the
academic world. The project is being directed by one of early MathML
folks, therefore he knows MathML very well.

Claim]
MathML cannot be used in text/html and people want text/html.

Reply]
One can introduce MathML islands in html documents during years. This has
been done in Internet Explorer since times when Explorer was mainly the
only major browser on the net.

One can also introduce MathML islands in html docs using a trick in
Mozilla that forces no XML validation and, therefore, lets built MathML
islands into a DOM is _not_ XHTML. I.e. no need to write <meta/> since
<meta> of HTML works fine.

Therefore, people can serve mathematical content via text/html, in .html
with HTML doctype and HTML legacy. Therefore, the main premise individual
authors cannot use MathML because HTML legacy is not correct.

Main reason individuals do not use MathML is because MathML was developed
over an incorrect basis. MathML is a XML application. XML (SGML subset)
was really designed for documents not data. XML is optimal when markup is
minimal. The recent usage of XML for data is more of an abuse than an use.
LISP was designed for data not documents. LISP is optimal when text
content is minimal.

You can see people in this list promoting MathML somewhat as you can see
people at LISP lists promoting LISP for documentation. I think that both
comunities lack realism to see the limits of their respective approaches.

Claim]
Mozilla want to implement MathML in HTML 5

Reply]
That I see on drafts and lists is not MathML. It is a strange hibrid
taking some parts of MathML and rejecting others whereas modifying parsing
and syntax.

I see people discussing stuff as

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <b>bold</b>
</math>

I see Ian claiming that he want to see stuff as <none> in HTML 5 instead
valid <none/> of MathML. He want no MathML entities except two or three he
choosed, he changes the syntax from &InvisibleTimes; to &InvisibleTimes
and does other further changes. That, of course, looks somewhat like
MathML but is not MathML.

He is proposing since early times that MathML

<math>
  <mrow>
    <mi>a</mi>
    <mo>+</mo>
    <mn>2</mn>
  </mrow>
</math>

would be encoded like <math><mrow>a + 2</mrow></math> in HTML 5 ‘version’.

This also some sense because is the way taken by several markup languages,
including ISO-12083, HTML5-Math draft, XML-MAIDEN, and even TeX and LISP
to some extension. The new markup language for mathematics developed by
Microsoft also use that way; there I not need to markup “a” the “+” and
the “2” as separated tokens.

However, Ian’s proposal is more an input syntax than a real markup design.
Therefore, in the end, one continues with the unnecesary redundancy of the
data oriented format built-in MathML is not in the other approaches.

Claim]
MathML in HTML 5 is more simple to implement for developers and would
spread MatHML in other browsers: Safari, Opera...

Reply]
Just it is not. Initial draft was simple and cheap to implement and needed
of no new additions to DOM, WS parsing, or CSS layers already implemented
in current browsers. This was recognized at the WhatWG list.

Current proposal for MathML in HTML 5 contains all trouble characteristics
of MathML- including WS parsing, DOM, and CSS mess-. But add a bunch of
new complexity because it is needed to modify the stilying and CSS layers,
probably the DOM engine needs to be fixed in some points. It is introduced
a different entity treatment and Ian want a different WS parser. I am not
sure that exact changes will be needed at the namespace layer.

As general rule –somewhat imprecise but tolerable- any browser want
implement MathML in HTML 5 would add each basic feature twice: one for
standard way to do things and other for special WhatWG way.

Claim]
MathML in HTML 5 will simplify MathML authoring.

Reply]
I think that proposal points just to the contrary.

Ian defines HTML 5 as “anything sent as text/html”. Therefore MathML in
HTML 5 will be “anything sent as text/html” I think that this will be
generate all class of problems that even now we cannot imagine.


### Some problems ###

Empty tags]

The proposal uses special HTML syntax for empty tags and SGML options for
closing. For instance
<foo><bar><baz></foo><quux>

are closed automatically when the parent tag is closed. But if you want
not that behavior then may use

<foo><bar></bar><baz></baz></foo><quux></quux>

1)
This will generate parsing, and copy and paste problems. Page from author
X is

<A><B></A>

and uses automatic closing. Now authors Y copies fragment B for reusing in
another doc

<D><B><C></C></D>

and wait

<D><B></B><C></C></D>

but is not [s]he obtains. However, if the code was reorganized to

<D><C></C><B></D>

then direct copy and paste works!

2)
MathML in HTML complicates the authoring of math: wait <baz> or
<baz></baz> and maybe <baz /> in HTML 5 but <baz/> in XHTML docs. Pages
generated from multiple authors (e.g. Wiki) will contain a whole mess of
options.

3)
Internet Explorer can render MathML in text/html, therefore it is waited
no interest in this new way.

MathML in HTML 5 offers us a return to the old days when web authors were
forced to generate two version of the same site: one for Netscape and
other for Explorer. In fact, this proposal would obligate to develop a
HTML with a MathML for MSIE and other HTML with other MathML for
Mozilla/Netscape.

4)
This blocks efficient comunication and reusing of tools. Note that if you
obtain a HTML 5 doc you cannot copy and paste it into a current MathML
application, because the input will be not well-formed. Forget then the
reuse of all MathML tools listed in the W3C MathML software list.

MathML that is not MathML]

In the long run Ian Hickson is promoting a non MathML syntax as already
said us at the WhatWG list. He claims for supporting someting like his

<mrow>a + 5</mrow>

next parsed to

<mrow><mi>a</mi><mo>+</mo><mn>5</mn></mrow>

and copied to the DOM.

Apart from the point that browsers’ developers would need to implement two
different parsing modes one for MathML where spaces are collapsed and one
(pre-parser mode specific for HTML 5) where spaces are used for
tokenization.

<mrow>a + 5</mrow> is _not_ MathML. Therefore current tools do not work
and new tools would to be developed. That is not that I call reusing a
standard.

Now imagine that 5 is not a number but a symbol. What would I write?

<mrow><mi>a</mi><mo>+</mo><mi>5</mi></mrow>

or this

<mrow>a +<mi>5</mi></mrow>?

or this

<mrow>a + <mi>5</mrow>?

And what if is (a+5+2)? Is not <mrow>a + <mi>5 + 2</mrow> incorrect? Or
the general rule for parent auto-closing does not apply to mi elements? Or
the DOM will be <mrow><mi>a</mi><mo>+</mo><mi>5+2</mi></mrow>?

A standard that is not standard]

One of advantages of a standard is that one can interchange information.
If I take a MathML fragment generated from Mathematica, I could paste it
into a DocBook document and next render all in a XHTML+MathML way or in a
XML+CSS way using a XSLT.

The MathML-in-HTML5 proposal avoids prefixes. Docbook requires mml prefix.
I cannot copy a MathML in a HTML 5 web page and copy it into a Docbook
without waiting tag name conflicts. The ‘standard’ becomes no more useful
than a new ad hoc approach <my-foo/> <my-bar/> was converted to Dockbook
MathML on the fly.


### Some solutions to mathematical markup ###

1) Current proposal for MathML in HTML 5 would be rejected since basically
returns the web to the old (forgotten?) days. It is not compatible with
current MathML standard and probably will be of no interest for any
browser except Mozilla.

2) Minimal mathematical capabilities could be introduced in HTML 5 with
almost no effort for both developers and authors, since initial draft
reused current DOM, parsing and CSS layers already implemented in
browsers. Author would learn three or four new tags whereas reusing their
current knowledge of DOM, CSS, and Javascript.

Development of basic tools to conversion to MathML would be cheap and
could be useful for both. Somewhat as the current interaction SVG-canvas
shows. Since SVG and canvas somewhat are complementary between them

3) MathML would be further revised for CSS-DOM friendly. This would
increase a bit the presence of MathML 3 when compared with fiasco of their
predecesors the web.

4) Still MathML would be seen as a first step to serious mathematical
markup. OpenMath-like binary encoding is not the solution.

Next logical step would be development of a new non-XML language really
adressing real-life needs.

XML is not good for anything. SVG is an typical example of how XML is
limited. SVG introduces a non-XML language because obvious inefficiency of
XML when there is more markup than content.

Something as this

<mml:math>
  <mml:apply>
    <mml:times/>
    <mml:ci>a</mml:ci>
    <mml:apply>
      <mml:plus/>
      <mml:ci>b</mml:ci>
      <mml:ci>c</mml:ci>
    </mml:apply>
  </mml:apply>
</mml:math>

is an abuse of the concept of markup language. Somewhat as the usage of
Scribe (LISP) for documentation is an abuse also (and reason was
superseded by SGML).


Juan R.

Center for CANONICAL |SCIENCE)

Received on Sunday, 1 October 2006 13:36:19 UTC