W3C home > Mailing lists > Public > www-math@w3.org > April 2008

Re: Exploring new vocabularies for HTML

From: <juan@canonicalscience.com>
Date: Tue, 8 Apr 2008 06:08:36 -0700 (PDT)
Message-ID: <38674.>
To: <public-html@w3.org>, <www-math@w3.org>

David Carlisle <davidc@nag.co.uk> wrote:

> Given the existing implementation and experience in this area surely
> MathML should not simply be "one of the options" it should be the
> main option. For HTML5 to invent some new math markup unsupported by
> any existing mathematical software would be a complete disaster for
> the cause of putting scientific documents on the web.

This seems to me an over dramatic statement would stop any possible
improvement to the web would arise from research being done around

Let us analyze a case extracted from the real world.

The original canonicalscience.com site was designed on XHTML + MathML.

As Neil said {QUOTE Given the difficulties with putting out XHTML
pages today} this was a source of problems. In what follows i will
resume only the problems associated to the MathML part of the whole
XML equation and why using a different markup has been a good option.

Only presentation MathML was explored due to the unpopularity of
Content MathML.

###  VERBOSITY  ###

Verbosity always was an issue, specially when the typical examples of
MathML (spec, Wikipedia, Wolfram) were substituted by typical
research scientists math.

Prototype software with large math expressions gives a 12x verbosity
for MathML. About 4x for small expressions like

<math xmlns='http://www.w3.org/1998/Math/MathML' display="block">

I can see here several authors who complaint about MathML


I may wait around a minute when opening academic works in my Firefox 2
GNU/Linux 1000MHz 0.5Gb.

Medium-size articles like this one


Print previews and other tasks are also slow.

Try to open four MathML articles at once on Firefox 2.

When opening a similar article in 'microformat' over HTML,
i can see the text before the math is completely rendered.

Authors can start to read initial article whereas bottom part is
rendered. Time is an issue for some people.

###  EDITING  ###

Verbosity rules out manual encoding. This gave another problem due
to the lack of adequate editor and tools. A few expensive applications
were generating acceptable code. Interestingly they were not oriented
to Office/Publishers environments.

I completely agree with Ian when said

 Anything we can do to make the language more maintainable will go a
 long way towards arguing for MathML over the alternatives

and with Jammes when said

 The supposed benefit is not to MathML editors but to authors using
 text editors. I have tried writing MathML-in-XHTML using only a text
 editor and the experience was painful to say the least. I found that
 the verbosity made it difficult to enter and then difficult to fix
 when I had made a mistake. The sensible solution might have been to
 use something like itex2MML to keep the source equations in
 human-readable form but that would have involved keeping two seperate
 representations of the document, with all the associated problems
 that that causes.

It is also worth to notice that most of tools generating presentation
MathML from some LaTeX or LaTeX-like code were not working correctly.

Distler blog has been now cited as example of site using MathML, and
someone introduced here Distler views about the HTML5 proposal. Well
then one would remember to HTML5 people some of the problems with the
MathML/IteX approach.

The MathML code served from several pages of Distler's blog were
analyzed on this list. This may be available on the archive.

>From memory:

Ultraverbose output. E.g. unneeded <mrow> around single <mi> and <mn>
elements. Mozilla MathML site recommends to avoid extra <mrow>


E.g. code like next was not unusual on the blog


Use of visual tricks /a la/ TeX. That is, just the kind of tricks that
MathML was supposed to avoid.

I remember use of <msup><mrow/> to simulate prescripts and use of
collections of <msup>, <msub>, and <msubsup> to simulate tensors
instead using the specific MathML elements.

Problems with numbers. This was corrected in a posterior version of
the software I think.

Completely broken code. I remember the case of line elements ds^2.
*Visually* they look fine but aurally they did not because the
structural code was the invalid


instead the correct


When preparing this message I have taken a look to Distler blog
articles of this year to check the status of the MathML/IteX
technology. It seems that point i) have been partially fixed since I
did not see unneeded <mrow> on several simple fractions i have checked.
Still on recent article


you can find next code

<mfrac xmlns="http://www.w3.org/1998/Math/MathML">

containing one unneeded <mrow> on the denominator.

Editing in 'microformat' HTML is free from those structural problems
and the code generated does not contain unneeded elements.

I can type mathematical equations on blogs, emails, and forums, when
the JS is activated.


Or lack of browers' support. Yes, native support for presentation
MathML has improved in recent times and some useful plugins are
available to IE users but problems remain.

For instance, users accessing to internet from Cibers and
libraries (including several University or CSIC research Centers)
have not plugin installed, cannot install by themselves because is
not their home computer, and pleas to install are not taken into
account by the corresponding dept.

With the 'microformat' HTML approach they enter on the library, open
the default IE on the computer, visit the sites, and they are seeing
the math.


Most of the web is done on HTML format. I have not done formal
statistics about academic sites and blogs but 99% of sites i visit
do not use XML.

This HTML popularity is a barrier for MathML approach.

The 'microformat' is HTML compatible and will be HTML5 compatible.


Neil has done next search

 If I do a search on +mfrac +mi +mo +mml:semantics
 [note the mml: namespace prefix, which I didn't include in my
 previous searches]

 Google says that there are "about 7,440" hits.

I have repeated the search


and Google returns about 7530 hits.

A look to source code for two first hits physmathcentral and
biomedcentral and you can see that both contain *escaped* code. As Ian
noticed the engine is returning pages with <mfrac>, <mi>, etc. like
escaped code.

Now i will search on a site containing real MathML pages.

I search text for instance  +tidy site:http://hermes.aei.mpg.de/lrr/


I get one hit:


This is a XHTML page containing presentation MathML and contains the
word "tidy" at start of section 1.1.

Now I search math  +mfrac site:http://hermes.aei.mpg.de/lrr/


And i get zero hits for pages containing fractions. Engine cannot
find the MathML fractions. The site contains many xhtml+matml pages,


with about a thousand of <mfrac>

Now i repeat the search for the 'microformat' case over HTML.

I search fractions  +{NU site:http://www.canonicalscience.org


and i get three results with pages containing fractions on the content.

The same 3 hits result using the same search string on another engines







Thus the non-MathML approach is also working here.


Yes, the 'microformat' is not supported by common algebra software
and all that. It is still beta but final version will convert to
standard languages.

I do not see a problem here.

###  RENDERING  ###

The microformat may be converted to different rendering formats as

In a future one can write XML pages with 'microformat' math and
convert to p-MathML on the fly and then natively rendered by Firefox
and Opera clients.


This is a point where probably the MathML approach was superior. There
exists a large research/experience with several accessibility specific
projects behind MathML is lacking in a novel approach in beta stage.

Still I do not see any special advantage over the microformat HTML
approach. In its final form it seems that accessibility will at least so

###  CONCLUSION  ###

As final conclusion I am forced to think that recent alternative models
are not a "complete disaster" when compared with MathML.

Juan R. González-Álvarez

Received on Wednesday, 9 April 2008 06:22:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:27:40 UTC