Re: Content Markup or Presentation Markup for Audio Rendering of MathML

Helder Ferreira wrote:
> Greetings everyone!
>  
> My aim is to create a parser that generates a text version of formulas 
> (MathML) in technical documents, to feed a TTS engine.

I presume the word converter would be more appropriate as "parsing" an 
XML document is granted by XML-parsers... (maybe these should be called 
tokenizers but they're not called so).

> What would you use for audio rendering of mathml?
> Content Markup or Presentation Markup?

I am an advocate of content markup (of OpenMath [1] actually), so excuse 
the bias but clearly content markup is better, I believe.

I believe rendering something like a binomial coefficient, you certainly 
want to say something containing the word binomial (in the appropriate 
language). In MathML presentation, however, (there was a thread on this 
topic not so long ago on this same mailing list, I believe), you need to 
use the mfrac element setting zero-width to the fraction-bar !

Of course, for this special case, evolving over a known field with a 
restricted amount of symbol and annotations, you could fiddle something 
that recognizes such a presentation as a binomial coefficient deciding 
through some heuristics that it is not a 2-D vector.
(I think MathType or MathPlayer can actually do exactly this, and they 
recognize that it is a heuristics which is highly impossible to extend).

> Seems to me that Content Markup has much more to offer for audio 
> rendering, however, most of the math applications or conversion tools 
> only create documents with presentation markup.
> Most people use simple editor's that create presentation markup (usually 
> in XHTML+MATHML format to publish in web).
>  
> Is Content Markup only to be used as a standard markup language between 
> applications? Or can it be used to publish technical documents too, with 
> the possibility to be usefull to render it in a TTS engine?

Oh sure, oh sure.
Have you had a look at OMDoc ? [2]
Have you had a look at our ActiveMath learning environment ?
We do just that: serving documents with OpenMath-encoded formulae.
(OpenMath is somewhat equivalent to MathML content, is better to my 
taste, but MathML content is comparable in expressivity if one accepts 
to use all sorts of csymbol elements).

We currently serve to HTML and PDF (using LaTeX).
And the semantic encoded formulae are put to good use, for example to 
provide copy-and-paste of formulae sub-terms into a 
computer-algebra-system (see our home page for articles providing more 
on that) [3].

If your "rendering" (oh, I think you said "parser") is actually working, 
I think we would have an interest to embed it within ActiveMath. The 
challenge is then very very interesting (there's an amount of research 
in there): most authors, not seeing anything different than a graphical 
output have difficulties to accept that we require the formulae to be 
encoded semantically. The "voice" output target would then be much much 
more than a justification!
(they would have actually little if not nothing, to rewrite from their 
content for it to be presented orally!).

Finally... how to encode content markup... well on this, there's not 
much, indeed.
I currently know:
-> QMath (http://www.matracas.org/)
-> OQMath, an extension thereof, (http://www.activemath.org/~paul/OQMath/)
-> ... hand-crafted, context specific, converters (we have a few, 
they're really not showable!)
-> Jome (http://jome.sourceforge.net/)
-> computer algebra systems
  -> GAP, Yacas (generating OpenMath)
  -> Maple, Mathematica (generating, actually incompatible, content MathML)
  -> these systems along with converters like the RIACA phrasebooks [4]


Hope that helps.

Paul


[1] OpenMath: 	http://www.openmath.org/
[2] OMdoc:    	http://www.mathweb.org/omdoc/
[3] ActiveMath 	http://www.activemath.org/)
[4] RIACA	http://www.riaca.win.tue.nl/products/index.html

Received on Wednesday, 23 April 2003 16:01:52 UTC