Re: Latex to Mathml translators

Brian Osserman <osserman@math.mit.edu> writes in reply to
Robert Miner <RobertM@dessci.com>:

> >Many projects involving MathML involve converting legacy data into XML
> >format, and one of the most important legacy formats in this context
> >is TeX. 
> 
>    Does the phrase 'legacy data' here imply that you expect that MathML will
> eventually replace tex as a primary data format? If so, how do you envision
> this happening, given MathML unsuitability for direct authoring?

In my observation almost current all math article authoring is
presently done with TeX, and the largest portion of that is, in fact,
LaTeX.  To see this download some article sources from ArXiv
(http://www.arxiv.org/).  (Somebody at ArXiv must, in fact, have
actual statistics, but I don't.)

Robert Miner replies:

> As to direct authoring, I presume you mean writing code with a text
> editor.  While most folks working with MathML have been primarily
> interested in graphical authoring, it is a simple matter to define a
> terse language and compile it into MathML.  After all, that is the
> model of TeX itself, compiling a various macro languages into DVI.
> It's merely a shame that TeX syntax is not normally regular enough to
> be particularly well suited to going to XML + MathML, as witnessed by
> the weakness of current TeX -> XML + MathML converters.

Yes, although an author who develops a consistent, well-structured
way of using LaTeX can expect fairly good results.  The idea of
characterizing definitively what is a well-structured kind of LaTeX in
a rigorous way is, I think, a blind alley.

> But it really is a triviality to come up with a language as terse as
> TeX that maps directly and unambiguously to some XML + MathML doc
> type.  For example, just changing <foo>...</foo> to \foo{...} and
> adding some default tokenization rules (that can be easily overridden)
> makes authoring MathML comparable to authoring TeX.

Someone used to LaTeX would want more than just a language for math
"islands" in a larger document.  For a smooth interface with the
existing practice of authors, one wants something reasonably like
LaTeX with math that is reasonably like math in LaTeX.

For new documents this is what the GELLMU project,
http://www.albany.edu/~hammond/gellmu/, is about.

It seeks to provide an XML document type that is as close to LaTeX as
reasonably possible, given the goal to have, in addition to standard
LaTeX translation, a well-defined bullet-proof translation to (1)
ordinary HTML using stripped TeX-like notation for math such as one
might see in email and (2) XHTML + content MathML for perusal in new
browsers.

As explained in my 2001 TugBoat article, which appeared early this
year, an author may use LaTeX-like notation -- including newcommand
definitions to generate articles in the XML document type.

At this point everything is in place except (2).  Toward (2) I have a
variant of the HTML 4.01 translator (written in perl for SGMLS.pm)
that writes XHTML 1.1 with UTF-8 encoding.

I estimate that it will take me between 20 and 40 weeks, in blocks of
5 or more weeks of undivided attention to do (2).  Along the way it
may be necessary to provide a few more attributes for math elements in
order to control non-default cases in translation to MathML.  Some
such attributes are already in place.

Unfortunately, it will not be until May 20, 2004 before I can
presently forsee another window of time.

Meanwhile if someone else is interested in undertaking a translation
from the XML form of the GELLMU article document type to XHTML + MathML,
I'll try to assemble an up-to-date tarball.

Writing a translation to XHTML + presentation MathML would be easier
than writing one to content MathML.  If browser fodder is all one
wants, that would be the way to go.

                                    -- Bill

Received on Wednesday, 17 September 2003 07:53:31 UTC