W3C home > Mailing lists > Public > www-math@w3.org > April 2012

Re: Speech Synthesis and Recognition of Mathematical and Scientific Content

From: Neil Soiffer <NeilS@dessci.com>
Date: Mon, 16 Apr 2012 10:07:57 -0700
Message-ID: <CAESRWkAhe9uzwaKA78NkZFR2PQDiMdmFC-at1yR4QYO5WiyZxQ@mail.gmail.com>
To: Adam Sobieski <adamsobieski@hotmail.com>
Cc: www-math@w3.org
One could use annotation-xml to embed SSML or other "rich" speech formats,
but why?  As you point out, there already exist projects to convert the
MathML to speech directly.  MathPlayer [1] (which my company distributes
for free) has done that for years and works with IE.  It can be used with a
large variety of assistive technology software[2].  It is by far the most
widely used math accessibility tool out there.  The latest version
(MathPlayer 3, public release 1) [3] allows for many options to customize
the speech to the needs of the user and/or subject matter.  It allows for
various styles of speech.  You could even write your own rules/speech if
you don't like what MathPlayer does, although that is not easy
(modifying/customizing existing rules is not hard though).  Using
annotation-xml hard codes in speech and forces a "one size fits all"
approach -- it seems the wrong way to go.

There are problem with the way speech engines speak math.  You can hear
examples at [4].

By exploring some of the references, you should be able to get a better
appreciation of what has already been done in this area.

Neil Soiffer
Senior Scientist
Design Science, Inc.
www.dessci.com
~ Makers of MathType, MathFlow, MathPlayer, MathDaisy, Equation Editor ~



[1]  http://www.dessci.com/en/products/mathplayer
[2]  http://www.dessci.com/en/solutions/access/atsupport.htm
[3]
http://news.dessci.com/2011/02/epub-3-first-public-draft-brings-enhanced-math-support-via-mathml.html
[4]  http://www.gh-mathspeak.com/tts.php


On Mon, Apr 16, 2012 at 7:15 AM, Adam Sobieski <adamsobieski@hotmail.com>wrote:

>               Math Working Group,
>
> Greetings.  In the new Speech API Community Group, I indicated some
> synthesis and recognition topics pertaining to mathematical and scientific
> notation (
> http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0004.html):
>
> EPUB3-style (
> http://idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-ssml-attrib) SSML
> attributes:
>
> <math ssml:ph="..."> ... </math>
>
> SSML in <annotation-xml>:
>
> <math>
> <semantics>
> ...
> <annotation-xml encoding="application/ssml+xml"> ... </annotation-xml>
> </semantics>
> </math>
>
> Some other related topics include referencing audio in
> <annotation>, interoperability with media fragment URI:
>
> <math>
> <semantics>
> ...
> <annotation encoding="audio/..." src="..." />
> </semantics>
> </math>
>
> and speech synthesis interoperability with SMIL-based scenarios.
>
> An interesting speech synthesis feature is the automatic synthesis
> of mathematical and scientific content.  The MathAudio project (
> http://lpf-esi.fe.up.pt/~audiomath/index_en.html)
> illustrates processing the MathML presentation layer into Portuguese (
> http://lpf-esi.fe.up.pt/~hfilipe/projecto/mathml.html) (
> http://lpf-esi.fe.up.pt/~audiomath/links_en.html).
>
> Semantic content can additionally be of use as input for such processing
> and related topics include somehow extending or annotating content
> dictionaries with linguistic data or extending or annotating linguistic
> data formats with content dictionary data for extensibility in that regard.
>
> I also indicated the possibility of extending or more fully
> utilizing speech recognition grammar techniques (SRGS/SISR) for recognition
> output scenarios including XML, hypertext, and/or MathML.
>
> I wanted to apprise the Math Working Group about those new developments
> and to welcome discussion and any comments and suggestions about the
> synthesis of and recognition of speech containing mathematical and
> scientific formulas.
>
>
>
> Kind regards,
>
> Adam Sobieski
>
>
Received on Monday, 16 April 2012 17:08:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 16 April 2012 17:08:31 GMT