- From: Paul Libbrecht <paul@activemath.org>
- Date: Wed, 23 Apr 2003 22:01:41 +0200
- To: Helder Ferreira <hfilipe@fe.up.pt>
- CC: www-math@w3.org, dfreitas@fe.up.pt
Helder Ferreira wrote: > Greetings everyone! > > My aim is to create a parser that generates a text version of formulas > (MathML) in technical documents, to feed a TTS engine. I presume the word converter would be more appropriate as "parsing" an XML document is granted by XML-parsers... (maybe these should be called tokenizers but they're not called so). > What would you use for audio rendering of mathml? > Content Markup or Presentation Markup? I am an advocate of content markup (of OpenMath [1] actually), so excuse the bias but clearly content markup is better, I believe. I believe rendering something like a binomial coefficient, you certainly want to say something containing the word binomial (in the appropriate language). In MathML presentation, however, (there was a thread on this topic not so long ago on this same mailing list, I believe), you need to use the mfrac element setting zero-width to the fraction-bar ! Of course, for this special case, evolving over a known field with a restricted amount of symbol and annotations, you could fiddle something that recognizes such a presentation as a binomial coefficient deciding through some heuristics that it is not a 2-D vector. (I think MathType or MathPlayer can actually do exactly this, and they recognize that it is a heuristics which is highly impossible to extend). > Seems to me that Content Markup has much more to offer for audio > rendering, however, most of the math applications or conversion tools > only create documents with presentation markup. > Most people use simple editor's that create presentation markup (usually > in XHTML+MATHML format to publish in web). > > Is Content Markup only to be used as a standard markup language between > applications? Or can it be used to publish technical documents too, with > the possibility to be usefull to render it in a TTS engine? Oh sure, oh sure. Have you had a look at OMDoc ? [2] Have you had a look at our ActiveMath learning environment ? We do just that: serving documents with OpenMath-encoded formulae. (OpenMath is somewhat equivalent to MathML content, is better to my taste, but MathML content is comparable in expressivity if one accepts to use all sorts of csymbol elements). We currently serve to HTML and PDF (using LaTeX). And the semantic encoded formulae are put to good use, for example to provide copy-and-paste of formulae sub-terms into a computer-algebra-system (see our home page for articles providing more on that) [3]. If your "rendering" (oh, I think you said "parser") is actually working, I think we would have an interest to embed it within ActiveMath. The challenge is then very very interesting (there's an amount of research in there): most authors, not seeing anything different than a graphical output have difficulties to accept that we require the formulae to be encoded semantically. The "voice" output target would then be much much more than a justification! (they would have actually little if not nothing, to rewrite from their content for it to be presented orally!). Finally... how to encode content markup... well on this, there's not much, indeed. I currently know: -> QMath (http://www.matracas.org/) -> OQMath, an extension thereof, (http://www.activemath.org/~paul/OQMath/) -> ... hand-crafted, context specific, converters (we have a few, they're really not showable!) -> Jome (http://jome.sourceforge.net/) -> computer algebra systems -> GAP, Yacas (generating OpenMath) -> Maple, Mathematica (generating, actually incompatible, content MathML) -> these systems along with converters like the RIACA phrasebooks [4] Hope that helps. Paul [1] OpenMath: http://www.openmath.org/ [2] OMdoc: http://www.mathweb.org/omdoc/ [3] ActiveMath http://www.activemath.org/) [4] RIACA http://www.riaca.win.tue.nl/products/index.html
Received on Wednesday, 23 April 2003 16:01:52 UTC