[whatwg] Mathematics in HTML5

About markup model. It has been proposed a specific markup language (Whyte
Lynx) and now variations from Michel Fortin and some other.

I agree with most of Lynx?s one but think we would not propose semantic
markups. Even Content MathML or OpenMath are not still sufficiently
thought as for broad adoption. In fact, both markups contained several
serious errors in the past have been corrected in recent versions.
OpenMath is more solid than Content MathML and probably latter never was
adopted in practice. Today implementation of both is close to zero.

I think that markup would be more easy possible with posibility of
ampliation, such that better authors could do a better job but average
users could easily obtain results in a cheap and rapid way.

I would disacourage any semantic markup and a focus only in structure
(markup) and presentation (CSS with XSL-FO as second choice).

In that case something like

<frac>
  <num>b</num>
  <den>2</den>
</frac>

is structural. Somewhat as a html document is composed of head and body,
here I am encoding a fraction (structure) is divided into numerator
(substructure) and denominator (substructure). Next structure can be
styled with CSS (or XSL-FO also) applying different styles to <num>,
<den>, and <frac>. There will be a default stylesheet (as in LaTeX) but
fine tuning of some special fraction can be achieved using CSS rules
directly somewhat as one can use special style attributes for finetuning
of some part of text, but in general one uses defalt stylesheet for HTML.
Okay.

MathML is only presentational not structural -there is not explicit
numerators or denominators- just a <mfrac> and two childs. It is not CSS
friendly, and for modifying fine-tuning of some fraction you may use
special MathML attributes, a new styling language, etc. This is not
rational.

I would not encourage usage of a type attribute. A simple class would be
sufficient and then we can reuse available CSS and HTML engines. The
implementation of full semantics in browsers would be very, very complex
and nobody has proved that using type=?matrix? or type=?vector? the
semantics can be unambiguosly encoded.

The forcing of special types for each token would be so boring as if HTML
text use a type attribute in span, p, and divs for each posible semantic
one can imagine: lemma, chapter, manifesto, group, afilliation,
booksection, etc. For that detailed markups as ISO12083 or Docbook are
better in text, OpenMath or some other approach will be better in math.

I find also rendering difficulties. For example in physicochemical comunity

<var type="matrix">X</var> would be render in bold *X*, but

<var type="matrix">X</var><sub>2,2</sub>

I would use normal rendering for because is a matrix element, moreover
<sub>2,2</sub> has not clear semantic meaning and is mixing semantics with
presentation; at the poor you would use <sub type=?index?>2,2</sub> but at
the best each index would be independently encoded and the comma typed as
separator of indices. But all of that is complex.

<var class="matrix">X</var> and a CSS rule would be cheap with

<var>X</var><sub>2,2</sub> in non-bold face.

I find boring also the double markup at Content MathML 2.0 with
type=?matrix? being used sometimes but <matrix> in others. In MathML above
example would be retypped via <selector> operator

<apply><selector/><ci type="matrix">X</ci><cn>2</cn><cn>2</cn></apply>

For any array structure ?matrix, vector, determinant or any other- why do
not simply reuse available HTML elements: <table>, <td>... instead
proposing new ones <mr>, <md> doing the same? CSS rules could use
different selectors

body> table: CSS rules for text tables

body> formula> table: CSS rules for text tables

fences could be done as

<fence left="round" right="square">expression</fence>

I would disacourage usage of <bounds>, <integral>, <product>, and all
those. They are semantic, when we would focus on structure more
presentation. Some constructs proposed here are CSS unlikely and from a
semantic point of view not correct at all.

Look for next code submited to this lists.

<integral>
  <bounds>
    <sub>0</sub>
    <sup>100<sup>
  </bounds>
  3<var>x</var> d<var>x</var>
</integral>

This mixes presentation and semantics. The usage of external containers as
<integral>, <sum>, <product>, was done in previous versions of MathML code
but abandoned in recent proposals. Math on the web began with insane code
as <root>2<of>x</root> and finalized with current MathML 2.0 proposal:
presentational part being CSS and XSL-FO unfriendly, and the content part
also rejected.

In fact, the encoding for something so simple as the integral sin (x) have
changed three or four times due to weakness of markups proposed, with last
Content MathML 2.0 being rudely critiqued from some comunities, for
instance OpenMath one. As illustration compare above code for integrals
with HTML 3 Math, with MathML 1, and in content MathML 2 with most recent
OpenMath code for the integral of sin (x)

<OMOBJ>
  <OMA>
    <OMS cd="calculus1" name="int"/>
    <OMBIND>
      <OMS cd="fns1" name="lambda"/>
      <OMBVAR> <OMV name="x"/> </OMBVAR>
      <OMA>
        <OMS name="sin" cd="transc1"/>
        <OMV name="x"/>
      </OMA>
    </OMBIND>
  </OMA>
</OMOBJ>


Ian Hickson wrote:

> I would be very cautious about introducing an entirely new language to
> do  this (even if it is "just" an extension of HTML4). For something as
> big as  Mathematics, we want to simply re-use an existing language, not
> invent a  new one. Inventing a new language for encoding content with as
> wide a  problem-space as mathematics would require months, as well as
> the time of  domain experts, etc. This work has already been done, e.g.
> in ISO12083,  MathML, LaTeX, and other such languages.

Nobody want reinvent the wheel, but people reuse languages when these *work*.

By reusing MathML one finalizes with an ugly language is not compatible
with rest of w3c technologies, semantically incorrect (or at least
incomplete) and practically nobody want waste time with it.

You look like a fervent admirer of the re-use of MathML, however some of
your proposal, such as special parsing mode or mixture of pure and mixed
content were proposed by other people (e.g. Juan R.) and completely
rejected by the own w3c MathML IG people even before begin a serious
debate.

Moreover, by reusing MathML 2.0 we are reusing exactly the same errors
that w3c MathML IG did with its last specification.

For example, why the own MathML IG did not reuse (all of) MathML 1.0 and
on the other hand proposed a new mayor revision with important changes?
Why did the own MathML IG decide to invent new tags such as <apply>
instead reusing MathML 1.0 code as <fn>, <reln> and others? Why did the
own w3c not reused <min> and <max> for integrals? Why were <of> and <left>
and <rigth> and <root> and <over> and several other tags not reused from
the early w3c Math draft of 1994? Why was the initial <EXPR> Math tag
finally abandoned? Do you know?

When mathematicians stated that TeX would be reused in XML (most of
mathematicians and just users of TeX and know little of internal design
and less still of web and XML requirements) Neil Soiffer ?one of w3c
MathML authors- replied

<blockquote>
Which part of TeX?

TeX is not amenable to the growing number of XML tools such as CSS, XSLT,
DOM, parsers...
</blockquote>

Therefore it is perfectly reasonable our rejecting of MathML because ?is
not amenable to the growing number of XML tools such as CSS, XSL-FO, DOM,
parsers...?

This debate would be about Whyte Lynx proposal for mathematics in HTML5
rather than discussions about the reuse of MathML. However, since it may
be informative explain why MathML is not popular. I offerede many examples
if incorrect code, incorrect desing, criticism and others. Now some
additional comments about MathML

Reference
NSF / NSDL Workshop:
Scientific Markup Languages
Workshop Report
Hosted by the National Science Foundation
June 14-15, 2004

Report prepared by:

Laura M. Bartolo, Kent State University
Timothy W. Cole, University of Illinois at Urbana-Champaign
Sarah Giersch, Association of Research Libraries
Michael Wright, UCAR ? DLESE Program Center

**** EXTRACT, comments for mine between [] *********************

Initial work on MathML predates even the formal release of XML as a W3C
Recommendation and draws on early experiences from SGML (e.g., the ISO
12083 Mathematics DTD fragment) and HTML (e.g., the abortive effort during
the development of HTML version 3 to augment HTML with a number of math
specific elements, attributes, and constructs). [...] There remain,
however, a number of substantive issues with regard to MathML.

As one of the very first domain-specific implementations of XML, there
were (necessarily) growing pains, and MathML is still seen as somewhat
experimental by many potential users in the math community. [...] MathML
is therefore recognized as inherently incomplete. The authors of MathML
have explicitly targeted it for the expression of mathematical content up
through the early undergraduate level (first-order calculus). Its utility
for research mathematics, even with its explicit built-in extension
mechanisms (e.g., as exploited in the EU funded OpenMath project), is
still uncertain. MathML is also intentionally bimodal, containing sets of
elements to describe separately the presentation of mathematics and the
semantics of mathematics. Generally, early implementers have focused on
one or the other but not both parts of the ML, resulting in asymmetrical
implementations that don't always interoperate as well as might be
desired. Adoption has been somewhat slow, _in part_ [emphasis mine]
because of the entrenchment of TeX within the research mathematics
community. Additionally, although mathematics is recognized as key to many
scientific disciplines, and there have been some attempts to incorporate
or accommodate MathML markup rules within other domain-specific markup
languages, there are examples of domain-specific markup languages (outside
of pure mathematics) that include their _own markup semantics_ [emphasis
mine] for basic mathematics needed within the domain of interest, rather
than borrowing from MathML as needed.

[...] The mathematics breakout discussion included a diversity of MathML
experts and current and would-be users and consumers of MathML. This
diversity of backgrounds and perspectives made for an energetic and
wide-ranging discussion.

In discussing the potential benefits that MathML might bring to bear on
educational services and models of learning, there were multiple points of
consensus as well as several open issues and uncertainties identified.
[...]

That said there remain several open issues as well regarding the potential
of MathML to help meet educational needs for a better way to express
mathematics in online documents and learning resources. The utility of
MathML to enhance searching and improve accessibility of online
mathematical content has not yet been proven. Searching of mathematically
laden content by the mathematics it contains is a complex issue. It's not
altogether clear whether the level of description implicit in content
(semantic) and/or presentational MathML is sufficient to support robust
searching on the mathematics contained in a resource. It's also not yet
certain that readers and other accessibility tools will be able to exploit
MathML effectively to make the mathematics embedded in a resource more
accessible, though that seems a safer bet. While MathML is being adopted
(at least experimentally [this was the case at the Center for CANONICAL
|SCIENCE) sure]) behind the scenes -- e.g., as an exchange format for
interoperation between applications like Mathematica and Maple and in the
editorial workflow of scholarly journals [I studied with detail the case
of the giant Elsevier and they are using an in-house modification of
MathML instead the w3c standard, because they also got problems and,
moreover, they are complementing the usage with own mathematical CEP
markup e.g. <ce:sub> and <ce:sup> for simple formulae], it has not been
widely adopted by the authors of educational and scholarly mathematical
content. Research mathematicians continue to rely heavily on TeX, which
though exclusively presentation oriented (really a specialized language
for the typesetting of mathematics) is firmly entrenched. Educators
continue to rely on cruder technologies (e.g., embedding mathematics as
static images within HTML or presentation only markup within PDF
documents) or exploit proprietary solutions such as Mathematica workbooks.
There remains a bit of a "chicken and egg" problem in that authors are
hesitant to adopt a new technology until it has proven its value, and it
remains difficult to prove the value of MathML without a sufficient body
of MathML content.

Discussion of this issue led naturally into an extended discussion as to
how MathML is now or might in the future engage the mathematics community.
It is clear that MathML at this point in time is more appealing to
organizations or institutions than it is to individual practitioners. As a
non-proprietary, expressive, comparatively low-loss way to represent
mathematics, MathML has clear attractions for long-term archiving and
interchange of mathematics on a large scale. Hence its attractiveness to
publishers and middleware tool developers. Several participants in the
breakout session suggested that MathML may continue to develop as a
largely or even exclusively back-end technology, used behind the scenes as
a way to store and exchange mathematical content, but not necessarily as a
format with direct impact on the author's or the end-user consumer's
experience interacting with mathematical content [this clearly indicate
that WE would provide a cheap but powerfull mathematical language for the
web, with end-users and authors in mind]. That would still make MathML
useful, but the consensus was that MathML's greatest potential both
economically and in terms of new functionality will not be realized until
it is used more widely by content creators and ultimate consumers [the
problem is that specification is weak and available tools cannot generate
first-class MathML code]. This will require even more aggressive
development of necessary authoring and presentation tools (including
interactive presentation tools) and the inclusion of MathML within markup
schemes developed by other science and technology communities that require
the ability to express rich mathematics in documents and learning
resources. This, in the collective opinion of those participating in the
Mathematics breakout discussion, suggested avenues of common interests
with other markup language communities represented at the workshop and led
to the identification of several key issues of importance to the further
development and future evolution of MathML:

?	the need for more ubiquitous, more transparent (to the user) support for
MathML in the Web environment;
?	the need for better support within XML and Web-based applications for
"compound documents" (i.e., as defined by the W3C, documents that combine
multiple formats, such as XHTML, SVG, SMIL and XForms);
?	better assurance that MathML will be maintained as a standard going
forward;
?	more sophisticated tools, especially on the authoring side, that can
facilitate inclusion/embedding of MathML within online resources (e.g.,
within Web pages);
?	continued development of better, more robust transformation tools (e.g.,
between TeX and MathML); and
?	viable business models to better support and encourage ongoing
development of MathML.
***********************************************************

A simple and cheap structural markup based in ISO-12083 (which is an
international standard, MathML is not) that can be styled with CSS (or
XSL-FO) and that do not need of special fonts, plugin, native support,
special tools, etcetera would be easily implemented and accepted by
authors.



Juan R.

Center for CANONICAL |SCIENCE)

Received on Friday, 9 June 2006 02:57:56 UTC