Profiling and certificates for MathML. Avoiding imitators

Profiling]

Currently, the available group of MathML tools generates a broad spectrum 
of heterogeneus outputs. You can match tools from the strictly
'presentational' output part of the spectrum (often wrong from a
structural or accessibility view) to high-quality parallel markup.

I wait that proposal for a profile attribute can be attached to a island
of MathML code was finally included in MathML 3. This profiling would let
us, for instance, the automatic embedding of docs fragments from different
authors into a single composed doc. Since profiling helps to identify
weakness on characteristic outputs for each tool.

This is not theoretical discourse, let me focus on a recent posting [1] in
Distler blog: the self-proclaimed technologically more advanced blog of
the planet.

Distler appears to finally correct the problem with ds^2 on IteX tool, but
continues generating redundant <mrows> in <mfrac>, <msup>, and other
constructs (practice disacouraged for Gecko based browsers [2]). Invisible
times continues being represented by justaposition. However, when one
finds <mrow><mn>4</mn><mi>&#960;</mi><mi>i</mi></mrow> one can add the two
lacking <mo>&invisibleTimes;</mo>. One also can suspect that the _i_ is
not a generic variable but the imaginary unit, correcting that also.

Tensors are incorrect encoded using tricks instead the correct tensor
structure of MathML. However, it is not difficult to notice that tensors
are often encoded as template

[<msup> | <msub>] [<mi>] [<mrow> [ two or more <mi>]

Conversor can generate a warning alerting the information operator at
least when matched that template.

Other stuff is more difficult. You can find dt being encoded in two
different ways –and rendering very different also!– in the same formula
[1]: as <mrow><mi>d</mi><mi>t</mi></mrow> and as <mi>dt</mi>. This only
can be corrected by posterior human checking I think.

A full conversion cannot be automated but profiling help to
semiautomatization. At least when the coversor/analizer detects a <mtext>
from certain ASCIIMath sources, the probability that <mtext> was being
used there as trick for forcing roman rendering of a variable is very
large -specially when <mtext> is being applied to a two tokens- and let us
take adequate actions correcting the source before including it in our
database.


Certificates]

However, profiling cannot work for invalid MathML or seudo-MathML. Therein
we need some kind of certification procedure that can guarantize authors
that tools they are using are generating real MathML.

Recently, I detected someone sent to me incorrect MathML. Investigating a
bit I find that WhatWG community has simply ignored any advice from MatHML
community and plans to embebed MathML (and SVG) into HTML5 [3]. Ok anyone
is free to make mistakes if desires, is not?

I already said everything I would say about that in the past both here and
in Mozilla lists. I find very disturbing now is that stuff is not MathML
is being called MathML.

Now, there is a first step for embebbeding MathML into HTML 4 from Sam
Ruby [3, 4]. It looks as suggestion I did here [5, 6] but no, wait, it is
not the same!

That I said then was a reply to the WhatWG and Mozilla’s people usuported
claim that people was asking for MathML support into HTML instead XML. I
suggested that if anyone was obtaining problems for publishing MathML in
XHTML would publish on HTML 4 using script-DOM techniques. It was a trick
for asistance not the rule for publishing maths on the next future.

However, the suggestion was to render MathML (true MathML) into HTML 4, 5,
or 6 if any more.

A priori, the new project [3, 4] is for adding something as _reformatted_
MathML. For instance, XML empty syntax is not supported and <none/> has to
be written as <none></none>. Several tools generate stuff as <mrow/>,
<mtext/>... Since none (repeat none) of the hundred of current MathML
tools generate full tags instead standard XML empty ones.

This may be read as "none of a hundred of MathML tools is compatible with
we (i.e. them) call MathML".

There is also no support for the MathML entitities several tools (and
legacy MathML docs at scientific databases and journals) are using.

It is interesting to notice that those people continue promoting modern
tools [3] being more limited that previously available ones. For example,
the mature thecniques discussed on [5, 6] let you to publish true MathML
(just copy and paste without changing empty tags) in HTML for both Gecko
and MSIE-Mathplayer.

The modern tool [3] only let you to publish previously _reformated_ MathML
"At least for Gecko-based browsers". Cool!

I decided to follow the MathML example [7] illustrated on [4] and I find
this source

<body>
  <math xmlns="http://www.w3.org/1998/Math/MathML"/>
    <mrow>
      <mi>x</mi>
      <mo>=</mo>
      <mfrac>
      ...
  </math>
<body>


Knowing (from experience) what kind of people is being involved in the
tasks of generation of MathML code and tools and with some experience on
all kind of scary stuff they are generating, including using HTML <strong>
for vectors inside MathML islands, thinking that MathML is a DOM and so
on, I carefully recommend the implementation of a formal certification for
tools using the logo, the name MathML, or even the MathML namespace.

In above case, the source is not even XML well-formed and could be easily
rejected by the W3C validation service but I still think that forcing
certification _is_ an issue!

This way, 'enthusiatic' 'novel' approaches as that of MathML on HTML 4 [3,
4] would receive a nice "this is not MathML, period" [*] and their authors
would be asked to carefully avoid any naming to MathML confounding users.

We have a bunch of experience with HTML and the creepy code is being
spreaded over the web. The situation with MathML arising from initiatives
as WhatWG HTML5 could be a true nightmare due to the intrinsic difficulty,
variability, and markup/text ratio for mathematical markup. The MathML W3C
would take actions responsible for the web well-working.

If no control over is being called MathML and sold as such on the web is
taken by the W3C and if people from personal projects, from organizations
as Mozilla or from ‘standard’ bodies as WhatWG continue calling mathML
stuff is not MathML (as defined by the W3C spec and DTD/Schema), then I am
sorry to say that we would valuate very much the posibility for a generic
"MathML input is not supported" alert on our 2007 website, including
forums [**].

If not formal reply is received to this suggestion, it is will be
interpreted as negative to discussion of this possibility.

References and notes]

[1]  http://golem.ph.utexas.edu/~distler/blog/archives/001030.html#more

[2]  http://www.mozilla.org/projects/mathml/authoring.html

[3]  http://golem.ph.utexas.edu/~distler/blog/archives/001065.html

[4]
http://www.intertwingly.net/blog/2006/12/05/HOWTO-Embed-MathML-and-SVG-into-HTML4

[5] http://lists.w3.org/Archives/Public/www-math/2006Oct/0102.html

[6]
http://canonicalscience.blogspot.com/2006/10/mathml-in-html-4-and-5-and.html

[7] http://intertwingly.net/stories/2006/12/05/mathml.html4


[*] Paraphrasing here Roger B. Sidje :]

[**] And I suspect that several organizations and others would follow our
way. Since i kow little people interested in solving problems artificially
created by others.

Received on Wednesday, 20 December 2006 12:04:15 UTC