[whatwg] Mathematics on HTML5

Henri Sivonen <hsivonen at iki.fi> wrote:

> I am pretty convinced that the granularity of markup needed for math
> and the verbosity of XML necessarily lead to an XML syntax for math
> that is not suitable for direct human authoring.

I doubt that.

> However, I think it
> does *not* necessarily follow that an XML syntax for math is an
> inherently bad idea.

I said not the contrary. I said is that specific MathML format is not good
enough due to political issues. The weak design is even recognized by own
MathML authors.

> Math even more than schemas or vector graphics needs to have an XML
> syntax, because math needs to integrate in prose on a more profound
> level than e.g. replaced elements would allow.

I do not agree; the current MathML is not really integrated with XHTML.

A HTML syntax is sufficient for the structural part. With standard HTML
<sub> and <sup> and a bit of assistance from a standard 2.1 CSS, one can
render complicated scripts structures in a standard HTML browser without
need for introducing the MathML soap <msub>, <mstyle>, <msup>, <msubsup>,
<mmultiscripts>, <none>, <mi>, <mo>, and <mn>, the extra MathML
attributes, the extra DOM, the extra styles, a different WS parser or the
special namespace in your document. Similar thoughts when one decides
reusing old but effective HTML <table> element instead of adding new
redundant ones: <mtable>, <mtr>, and <mtd>.

The true debate is on the content part, which is not solved with content
MathML.

>> 1) Insanely complicated and inefficient. In some cases, I have
>> computed 15
>> times more bandwidth and server storage when using MathML than
>> alternatives.
>
> gzip

1) First I would note that I was not talking of verbosity of XML end tags
or similar, but of the inefficient markup model specific to MathML. Have
you tried to encode E=mc2 in full parallel MathML? And what about fine
parallel markup? Fine parallel markup is so complex that even the own Math
WG provided an alternative code. They did not recommend gzip to users ;-).

2) It does no sense to offer people gzip archives of online documents for
downloading and reading off-line!

3) Even using compression, one may unzip files before accessing to data. I
cannot manipulate a file of 15MB in my computer when the *same*
information could be stored in 1-3 MB.

The w3c has done a big effort on providing us lightweight rational
alternatives to old insane approaches. A typical example is the usage of a
simple CSS external document for all your HTML documents instead of
repetitive encoding of font style in each paragraph of each document.
MathML just break this tendency providing one of most ultraverbose and
redundant encodings I have seen in my life.

Another example. I can write

<p>This is an <i>important</i> text</p>

in presentational HTML. Rationale is simple but effective, text is roman
by default and when text is italic you markup with <i>. The code in a
MathML fashion would be typed like

<mp><mr>This</mr> <mr>is</mr> <mr>an</mr> <mi>important</mi>
<mr>text</mr></mp>

That is, you redundantly says to computer that each token is roman with
?important? being rendered as italic. The same information but bloated.

However, since presentational markup is not likely, above <i> is better
encoded as <em> in HTML. Next <em> is already rendered as italics by
default but I can change rendering via i) CSS in the head of document ii)
external CSS is used by several documents at once iii) Special CSS rule
add via style attribute

4) Still using gzip other approaches are less expensive in both disk and
bandwidth.

> How is MathML not compatible with the DOM?

Introducing specific DOM model does implementation in browsers mainly
impossible. MathML is not integrable with rest of browsers technologies as
DOM, CSS, and WS model. All this generates problems and headaches to
browser developers and reason of real failure to see browsers with native
support of MathML.

There is a lot of technical details in Opera browser developers site on
why they rejected *native* MathML support.

FO developers also failed to provide unification of MathML with XSL-FO.

Is this way? The manifesto for HTML 5 emphasized

<blockquote>
Web application technologies should be based on technologies authors are
familiar with, including HTML, CSS, DOM, and JavaScript.
</blockquote>

and

<blockquote>
Basic Web application features should be implementable using behaviors,
scripting, and style sheets in IE6 today so that authors have a clear
migration path. Any solution that cannot be used with the current
high-market-share user agent without the need for binary plug-ins is
highly unlikely to be successful.
</blockquote>

MathML does not fit in this philosophy, therefore may be abandoned.

>> Well MathML is not really based in those. But we can render math using
>> just HTML, and CSS, and we can use JS and DOM in the same way we   use
>> in
>> HTML or CSS for text. Look XML-MAIDEN
>> [http://www.geocities.com/csssite/index.xml] for ideas, samples, etc.
>
> Interesting. However, the results have the look and feel of a
> afterthought math editor for a word processor rather than the look   and
> feel of pdfLaTeX output.

The look and feel are better that with MathML. The markup is lightweight
and can be easily accessed and modified via standard DOM and CSS rules.
Moreover, rendering is incremental and thanks to recent advances by George
many browsers can see almost all mathematics, whereas MathML support is a
kind of binary logic: or you can see math (Firefox, MSIE + plug in) or you
cannot.

Moreover, the MAIDEN markup can be transformed to TeX for printing via TeX
engines whereas better CSS-based printed engines are not ready.

Do not forget that those articles are generated with a couple of simple
CSS 2.1 rules *without* font metrics information (TeX cannot do that and
relies on specific collection of fonts are not designed for web and in a
very complex formatting engine).

Fine tuning in the web can be achieved complementing the generic
XML-MAIDEN stylesheet with more rules for special cases or with fine
tuning CSS rules directly inserted in the document. It is not difficult to
achieve TeX output quality when using font metrics.

Improved rendering engines and more experience with CSS in this field
would provide a better rendering quality. However due to difficulties for
implementation of MathML in browsers, it will be difficult that you can
obtain fine tuning of formulae someday.

And, of course, it is close to impossibility that you can provide a TeX
engine for the web (one of reasons TeX has failed to conquer the web).

>> One of problems
>> with this approach is that once new STIX fonts available I can use
>> them in
>> HTML, also in CSS rendering of math, but I cannot use them in firefox,
>> since MathML module would be rewritten, and the full engine
>> recompiled,
>> obligating to users to download and install new versions of browser
>> for
>> new fonts!!!
>
> The PUA mapping is indeed a problem. If you want to see a change   here,
> I suggest creating an OpenType font that uses the Type 1
> outlines from the YandY version of Computer Modern and has proper
> Unicode mappings.

I prefer to follow usual web design guidelines providing rendering engines
and technologies were independent of the fonts installed at the client
side.

But this does not impede implementation of specific rendering printer
engines dealing with a collections of predefined fonts in specific
domains. For example a library could implement a printing engine optimized
with font metrics for the STIXs. CM no thanks!

>> 4) The default printing of MathML is not good and people is
>> returning to
>> TeX for that!
>
> In general, Knuth was over 20 years ahead of everyone else. CSS-based
> typesetters are still catching up with TeX on some things. (And the
> bar is pretty high.)

No. It is relatively trivial to provide TeX quality in different markups
when one knows font metrics.

In fact, one can see several authors providing TeX quality with SVG and
even with HTML approaches when font metrics are known. However, nobody has
provided TeX quality using just MathML. Only do I note the paradox that
neither SVG nor HTML were designed for mathematical rendering?

This low quality has obligated to people to translate MathML to TeX and
next print formulae with a traditional TeX engine.

The really difficult problem is to provide good typesetting quality
without rely on specific fonts; Knuth has not solved this still ;-)

> But yes, if you want to print math, pdfLaTeX is the best thing
> around. Changing the syntax of MathML does not help in catching up.
> Improving the rendering engine does.

No, the problem of MathML is in its syntax and content model. Both are
incompatible with the rendering engines of browsers and as previously said
also publishers using XSL-FO in books and documents were unable to
incorporate MathML in the rendering-print engine.

HTML Math could be incorporated in a few days because it an incremental
implementation.

>> 5) Accessibility is very deficient
>
> A different syntax won't help. Implementations of accessibility tools
> will.

False. Alternative syntaxes for mathematics already proposed are more
accessible than MathML by several technical motives. Moreover, it has been
proven that current implementations of MathML in browsers are not detailed
enough for that accessibility tools can work. This is reason that
MathPLayer audio rendering does not deal correctly with tabular data and
that generates some ambiguous readings. Ambiguous rendering are absent
when one uses old GIF model + ALT. Ambiguous rendering could be eliminated
if providing a new approach in the future HTML.

By encouraging usage of p-MathML in HTML 5 one is generating inaccessible
code.

>> Moreover, the situation is still poor than that! Many sites claiming
>> theoretical accessibilities (e.g. Distler blog) are serving (ds)^2 as
>> <mi>d</mi><msup><mi>s</mi><mn>2</mn></msup>, i.e. 2s ds!!!
>
> I'm pretty sure Distler doesn't claim his math to be accessible, and
> I'm pretty sure he is quite aware of the paradox that AsTeR does not
> support MathML even though its author was on the WG.
>
> http://golem.ph.utexas.edu/~distler/blog/archives/000199.html

Distler does lot of different claims. In the accessibility statement

[http://golem.ph.utexas.edu/~distler/blog/accessibility.html]

says

?Equations are written in MathML 2.0.?

And do not explain that accessibility of their self-proclaimed
ultra-advanced blog is poor that if had used old HTML + GIF + ALT model
(or using PDF or even LaTeX). For instance Distler does not explain to his
readers that when using ALT attribute in a GIF the image is more
accessible that when using the verbose MathML code it is generating and
serving on the Internet.

Perfect p-MathML 2.0 code is not really accessible, but still poor, MathML
code is being served by Distler is structurally invalid and based in
tricks. For example, he is using <mrow> and tricky collections of
<msubsup> for simulation of tensors. He does not use invisible operators
introduced in MathML. He encodes prescripts as in TeX via empty groups
instead using <prescript> tag. This is odd!

The question is why one would encourage usage of MathML in HTML 5 when it
is doing poor that old approaches! We can offer accessible math in next
HTML 5.

There is not such one paradox! MathML does not work. Accessibility of
MathML is just a myth.

>> 6) There are problems with default rendering of entities
>
> XML entities on the Web are b0rked. Since MathML is not human-
> writable anyway, let's get rid of the entities.

Curiously the problems with MathML entities are solved in other
approaches. I already solve that some time ago...

>> Accessible code render ugly in screen whereas
>> visually correct code being inaccessible.
>
> "Accessible" code is just theory, right?

No.

>> 7) The possibility for automated searches of math continues being
>> largely
>> a myth.
>
> Many, many things related to searchability, internationalization and
> accessibility are myths in the realm of semantic markup.

Then this is argument for not usage of MathML in HTML 5, point.

However, you just are systematically failing to understand main points
here. Take the case of searches.

If anyone encodes E=mc2 in HTML, I can search it with Google and works
very well. But I cannot search the formula when encoded in MathML
(presentation, content, or parallel). There is ?infinite? ways to encode
same formulae because specification is weak and full of technical holes. I
did an experience with a simple dot{q} in most popular MathML tools. Each
MathML tool encoded the same TeX command in different ways. Moreover code
generated by two of MathML tools failed to be rendered by Mathematica 5.2
when directly copied and pasted. I can copy \dot{q} from any TeX doc and
paste it in any other TeX doc and TeX engine will work.

I can copy E=mc2 from any HTML source and copy it in a HTML editor and
will work.

Using XML-MAIDEN or ISO 12083 inspirated approaches the encoding would be
far more uniform and, therefore, ready to automated search by some engine.

>> 8) Visual rendering is not incremental as in CSS. This can offer us
>> problems with large documents or even with server failures. I find
>> just
>> curious the w3c emphasis on abandoning non incremental rendering of
>> old
>> HTML presentational table layout models in favour of CSS layouts,
>> whereas
>> forcing usage of a non incremental MathML presentational markup. Some
>> mathematical documents take order of 10 minutes before rendering in
>> Firefox.
>
> This is not a design problem with MathML. This is Mozilla bug #18333.

Is _that_ bug related to the 10 minutes, to that MathML rendering is not
incremental when compared to a CSS solution, or what?

Moreover, you may know that the problem of the bad MathML support on
Firefox (and other Gecko based browsers) is not due to the low-quality of
programmers but to the explicit incompatibility of MathML with previous
HTML and CSS layout models. However, I can promise you that implementation
of the same mathematics in an alternative math approach could be
implemented in current browsers in less than a pair of days.

Again MathML does not fit with the Position Paper I cited previously.

>> 10) Advantages of being using a "standard" vanish when one observes
>> the
>> infinite malleability of mathml code. For example people is simulating
>> tensors with nested msup, msub, msubsup, and tricky mrows, instead
>> using
>> <multiscript> and <none/>. Then hypothetical standardization
>> advantages
>> are lost.
>
> Yeah, MathML is presentational in practice.

No, because both visual rendering *and* structure of those formulae is
incorrect. (Presentational MathML was also designed for structuring
mathematics as you know).

The problem is in the MathML design again. For instance, ISO 12083 -using
less tags- was able to encode more scripts structure than MathML 2.0 can.

The interesting is you would add one or two new tags to HTML for a full
rendering of tensors. Since all people would use same tags, the encodings
would be standardized. Moreover, you could use <sup> for both text and
math.

>> 12) The use of presentational markup is contrary to common sense.
>
> Is LaTeX contrary to common sense? Which looks better a LaTeX
> printout or a Mathematica printout?

You are failing to fix the point. My full message was

>> 12) The use of presentational markup is contrary to common sense. I
>> write <H1> in HTML and next I said one -and only once- in a CSS how the
>> heading may be rendered in my doc. That CSS, when stored externally, can
>> be called by billions of others HTML docs. In MathML you are forced to
>> repeat presentation in each formula in each document, to use mstyle...
>>
>> The use of a presentational language for mathematics remember me the old
>> days of the <font>, <b>, <i>, <center> tags. Little impact of MathML in
>> the web remind me the failure of XSL-FO to conquer the web. Instead
>> specific presentation MathML markup complemented with lot of <mstyle>
>> tags I would prefer semantic or structural markup.

Precisely LaTeX substituted TeX presentational markup by default
stylesheets and macros with emphasis on content. You apparently misguided
the point that HTML was semantic markup, after transformed in
presentational markup by big developers (the nightmare of <font>) and
recently retransformed again in semantic markup with presentation best
done by CSS and elimination of <font> and family. Similar approach was
taken in ISO 12083, with structural markup in SGML and styling via DSSSL.
However MathML adds presentational tags and styling markup directly *to*
the document which is contrary to common sense.

Precisely also future Latex 3 is being mainly improved in the stylistic
part with emphasis on copying SGML and HTML models. For example, LaTeX 3
interface will support DSSSL specifications and style-sheet concepts such
as those used with HTML and XML.

I see no reason for repeating here errors done in the past. MathML is not
way.

>> A) Eliminate next text from specification
>>
>> "Authors are encouraged to use MathML for marking up mathematics"
>>
>> because authors would use more concise powerful and solid markup for
>> mathematics.
>
> -1 at least until an alternative is implemented and deployed in UAs.

MathML is not alternative at all as 10 years verified. Any other
alternative approach could be implemented in browsers in a few days,
because one can reuse working HTML, CSS, and DOM.

Moreover advantages of this kind of approach I am suggesting are
applicable beyond mathematics. For example a better support for the
standard CSS block-inline is crucial for mathematical rendering. But a
better support of the rule benefits also rendering of other kind of
documents. A better support for <mroot> in MathML only benefits
mathematical documents prepared using p-MathML.

>> C) A more complete approach is providing a set of structural and/or
>> semantic tags for usage with HTML5.
>
> Scope creep.

?????

>> One needs little tags, because <sub>, <sup>, <var> and <table> can be
>> reused.
>
> I don't believe that considering that vast feature set LaTeX needs to
> provide.

Unfortunately, LaTeX design is erratic obligating to introduce billions of
redundant commands, each one with different syntax, content model, etc.

A typical example is LaTeX redundancy between \frac and \over for
fractions. Nothing of that is needed in XML-MAIDEN or in ISO 12083, the
international standard for mathematics on SGML. It is clear that a
combination of tags more powerful CSS rules more Unicode is all one needs.

For instance, amstex package provides you a special command with two
attributes for placing indices in certain roots. This non-modular approach
is unnecessary in SGML/XML/HTML. A stylesheet (CSS, DSSL, XSL-FO)
generates default rendering but you can do fine-tuning of position of the
index via standard CSS rules applied to special element (you can modify
colors, stretching, kerning, heights, baselines, margins, paddings,
positions, etc. and many more things LaTeX cannot via CSS). In amstex you
need a special command for vertical align of index in the root, in CSS you
could use the standard vertical-align command you use for rendering any
other text.

But again apple and oranges. I am explaining that one can provide a better
support for online mathematics than using presentation MathML just by
addition of a few tags to the future HTML. Your appeal to LaTeX appears a
bit off topic and in any case is not relevant for not considering my
proposal of avoiding MathML as mathematical markup.

> --
> Henri Sivonen
> hsivonen at iki.fi
> http://hsivonen.iki.fi/
>


Juan R.

Center for CANONICAL |SCIENCE)

Received on Saturday, 27 May 2006 10:10:42 UTC