Re: Exploring new vocabularies for HTML

David Carlisle <davidc@nag.co.uk> wrote:

> Given the existing implementation and experience in this area surely
> MathML should not simply be "one of the options" it should be the
> main option. For HTML5 to invent some new math markup unsupported by
> any existing mathematical software would be a complete disaster for
> the cause of putting scientific documents on the web.

This seems to me an over dramatic statement would stop any possible
improvement to the web would arise from research being done around
HTML5.

Let us analyze a case extracted from the real world.

The original canonicalscience.com site was designed on XHTML + MathML.

As Neil said {QUOTE Given the difficulties with putting out XHTML
pages today} this was a source of problems. In what follows i will
resume only the problems associated to the MathML part of the whole
XML equation and why using a different markup has been a good option.

Only presentation MathML was explored due to the unpopularity of
Content MathML.


###  VERBOSITY  ###

Verbosity always was an issue, specially when the typical examples of
MathML (spec, Wikipedia, Wolfram) were substituted by typical
research scientists math.

Prototype software with large math expressions gives a 12x verbosity
for MathML. About 4x for small expressions like

<math xmlns='http://www.w3.org/1998/Math/MathML' display="block">
  <mrow>
    <mo>d</mo>
    <mi>S</mi>
  </mrow>
  <mo>&#8805;</mo>
  <mfrac>
    <mrow>
      <mo>&#948;</mo>
      <mi>Q</mi>
    </mrow>
    <mi>T</mi>
  </mfrac>
</math>

I can see here several authors who complaint about MathML
verbosity.


###  INCREMENTAL RENDERING  ###

I may wait around a minute when opening academic works in my Firefox 2
GNU/Linux 1000MHz 0.5Gb.

Medium-size articles like this one

http://hermes.aei.mpg.de/lrr/2001/1/article.xhtml

Print previews and other tasks are also slow.

Try to open four MathML articles at once on Firefox 2.

When opening a similar article in 'microformat' over HTML,
i can see the text before the math is completely rendered.

Authors can start to read initial article whereas bottom part is
rendered. Time is an issue for some people.


###  EDITING  ###

Verbosity rules out manual encoding. This gave another problem due
to the lack of adequate editor and tools. A few expensive applications
were generating acceptable code. Interestingly they were not oriented
to Office/Publishers environments.

I completely agree with Ian when said

{BLOCKQUOTE
 Anything we can do to make the language more maintainable will go a
 long way towards arguing for MathML over the alternatives
}

and with Jammes when said

{BLOCKQUOTE
 The supposed benefit is not to MathML editors but to authors using
 text editors. I have tried writing MathML-in-XHTML using only a text
 editor and the experience was painful to say the least. I found that
 the verbosity made it difficult to enter and then difficult to fix
 when I had made a mistake. The sensible solution might have been to
 use something like itex2MML to keep the source equations in
 human-readable form but that would have involved keeping two seperate
 representations of the document, with all the associated problems
 that that causes.
}

It is also worth to notice that most of tools generating presentation
MathML from some LaTeX or LaTeX-like code were not working correctly.

Distler blog has been now cited as example of site using MathML, and
someone introduced here Distler views about the HTML5 proposal. Well
then one would remember to HTML5 people some of the problems with the
MathML/IteX approach.

The MathML code served from several pages of Distler's blog were
analyzed on this list. This may be available on the archive.

>From memory:

i)
Ultraverbose output. E.g. unneeded <mrow> around single <mi> and <mn>
elements. Mozilla MathML site recommends to avoid extra <mrow>

http://www.mozilla.org/projects/mathml/authoring.html

E.g. code like next was not unusual on the blog

<mfrac>
  <mrow>
    <mi>a</mi>
  </mrow>
  <mrow>
    <mn>2</mn>
  </mrow>
</mfrac>

ii)
Use of visual tricks /a la/ TeX. That is, just the kind of tricks that
MathML was supposed to avoid.

I remember use of <msup><mrow/> to simulate prescripts and use of
collections of <msup>, <msub>, and <msubsup> to simulate tensors
instead using the specific MathML elements.

iii)
Problems with numbers. This was corrected in a posterior version of
the software I think.

iv)
Completely broken code. I remember the case of line elements ds^2.
*Visually* they look fine but aurally they did not because the
structural code was the invalid

<mo>d</mo><msup><mi>s</mi><mn>2</mn></msup>

instead the correct

<msup><mrow><mo>d</mo><mi>s</mi></mrow><mn>2</mn></msup>

When preparing this message I have taken a look to Distler blog
articles of this year to check the status of the MathML/IteX
technology. It seems that point i) have been partially fixed since I
did not see unneeded <mrow> on several simple fractions i have checked.
Still on recent article

http://golem.ph.utexas.edu/~distler/blog/archives/001560.html#more

you can find next code

<mfrac xmlns="http://www.w3.org/1998/Math/MathML">
  <mover>
    <mi>H</mi>
    <mo>&#729;</mo>
  </mover>
  <mrow>
    <msup>
      <mi>H</mi>
      <mn>2</mn>
    </msup>
  </mrow>
</mfrac>

containing one unneeded <mrow> on the denominator.

Editing in 'microformat' HTML is free from those structural problems
and the code generated does not contain unneeded elements.

I can type mathematical equations on blogs, emails, and forums, when
the JS is activated.


###  BROWSERS SUPPORT  ###

Or lack of browers' support. Yes, native support for presentation
MathML has improved in recent times and some useful plugins are
available to IE users but problems remain.

For instance, users accessing to internet from Cibers and
libraries (including several University or CSIC research Centers)
have not plugin installed, cannot install by themselves because is
not their home computer, and pleas to install are not taken into
account by the corresponding dept.

With the 'microformat' HTML approach they enter on the library, open
the default IE on the computer, visit the sites, and they are seeing
the math.


###  HTML POPULARITY  ###

Most of the web is done on HTML format. I have not done formal
statistics about academic sites and blogs but 99% of sites i visit
do not use XML.

This HTML popularity is a barrier for MathML approach.

The 'microformat' is HTML compatible and will be HTML5 compatible.


###  SEARCH ENGINES  ###

Neil has done next search

{BLOCKQUOTE
 If I do a search on +mfrac +mi +mo +mml:semantics
 [note the mml: namespace prefix, which I didn't include in my
 previous searches]

 Google says that there are "about 7,440" hits.
}

I have repeated the search

http://www.google.es/search?q=%2Bmfrac+%2Bmi+%2Bmo+%2Bmml%3Asemantics&
ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-a

and Google returns about 7530 hits.

A look to source code for two first hits physmathcentral and
biomedcentral and you can see that both contain *escaped* code. As Ian
noticed the engine is returning pages with <mfrac>, <mi>, etc. like
escaped code.

Now i will search on a site containing real MathML pages.

I search text for instance  +tidy site:http://hermes.aei.mpg.de/lrr/

http://www.google.es/search?hl=es&client=firefox-a&rls=com.ubuntu%3Aen
-US%3Aofficial&q=%2Btidy+site%3Ahttp%3A%2F%2Fhermes.aei.mpg.de%2Flrr%2
F&btnG=Buscar&meta=

I get one hit:

http://hermes.aei.mpg.de/lrr/2001/1/article.xhtml

This is a XHTML page containing presentation MathML and contains the
word "tidy" at start of section 1.1.

Now I search math  +mfrac site:http://hermes.aei.mpg.de/lrr/

http://www.google.es/search?hl=es&client=firefox-a&rls=com.ubuntu%3Aen
-US%3Aofficial&q=%2Bmfrac+site%3Ahttp%3A%2F%2Fhermes.aei.mpg.de%2Flrr%
2F&btnG=Buscar&meta=

And i get zero hits for pages containing fractions. Engine cannot
find the MathML fractions. The site contains many xhtml+matml pages,

http://hermes.aei.mpg.de/lrr/

with about a thousand of <mfrac>

Now i repeat the search for the 'microformat' case over HTML.

I search fractions  +{NU site:http://www.canonicalscience.org

http://www.google.es/search?hl=es&client=firefox-a&rls=com.ubuntu%3Aen
-US%3Aofficial&hs=xjG&q=%2B%7BNU+site%3Ahttp%3A%2F%2Fwww.canonicalscie
nce.org&btnG=Buscar&meta=

and i get three results with pages containing fractions on the content.

The same 3 hits result using the same search string on another engines

Yahoo:

http://search.yahoo.com/search;_ylt=A0geu7UUWvtH_cIAdlRXNyoA?p=%2B%7BN
U+site%3Ahttp%3A%2F%2Fwww.canonicalscience.org&y=Search&fr=moz2&ei=UTF
-8

Answers:

http://www.answers.com/main/ntquery?s=%2B%7BNU%20site%3Ahttp%3A%2F%2Fw
ww.canonicalscience.org&ff=1

Altavista:

http://www.altavista.com/web/results?itag=ody&q=%2B%7BNU+site%3Ahttp%3
A%2F%2Fwww.canonicalscience.org&kgs=1&kls=0

Thus the non-MathML approach is also working here.


###  COMPUTATIONAL SOFTWARE  ###

Yes, the 'microformat' is not supported by common algebra software
and all that. It is still beta but final version will convert to
standard languages.

I do not see a problem here.


###  RENDERING  ###

The microformat may be converted to different rendering formats as
p-MathML, SVG, XSL-FO, XML-MAIDEN, etc.

In a future one can write XML pages with 'microformat' math and
convert to p-MathML on the fly and then natively rendered by Firefox
and Opera clients.


###  ACCESSIBILITY  ###

This is a point where probably the MathML approach was superior. There
exists a large research/experience with several accessibility specific
projects behind MathML is lacking in a novel approach in beta stage.

Still I do not see any special advantage over the microformat HTML
approach. In its final form it seems that accessibility will at least so
good.


###  CONCLUSION  ###

As final conclusion I am forced to think that recent alternative models
are not a "complete disaster" when compared with MathML.


Juan R. González-Álvarez

Center for CANONICAL |SCIENCE)

Received on Wednesday, 9 April 2008 06:22:52 UTC