RE: Speech Synthesis and Recognition of Mathematical and Scientific Content from Adam Sobieski on 2012-04-17 (www-math@w3.org from April 2012)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Tue, 17 Apr 2012 23:30:59 +0000
To: Neil Soiffer <neils@dessci.com>
CC: <www-math@w3.org>
Message-ID: <SNT138-W69A82CA1DCFFD0C4B3B42C53F0@phx.gbl>
Neil Soiffer, Some ideas with regard to avoiding "one size fits all", are: With regard to <annotation>, <annotation-xml> and SSML, ideas include content type parameters: <annotation-xml encoding="application/ssml+xml;param=value"> ... </annotation-xml> XML attributes: <annotation-xml encoding="application/ssml+xml" ext:param="value"> ... </annotation-xml> and/or extending SSML to include paraphrases, where a <speak> element could contain paraphrases with each having parameters of use to heuristically selecting a paraphrase. In addition to somehow extending or annotating content dictionaries with linguistic data or extending or annotating linguistic data formats with content dictionary data, linguistic data resources can be indicated on MathML elements.  Resembling: URI = cdbase + '/' + cd-name + '#' + symbol-name another set of attributes can be described which composite into a URI which references a linguistic data resource of use to speech recognition and synthesis. Those are some ideas to enhance configurability.   Kind regards, Adam Date: Mon, 16 Apr 2012 10:07:57 -0700
Subject: Re: Speech Synthesis and Recognition of Mathematical and Scientific Content
From: NeilS@dessci.com
To: adamsobieski@hotmail.com
CC: www-math@w3.org

One could use annotation-xml to embed SSML or other "rich" speech formats, but why?  As you point out, there already exist projects to convert the MathML to speech directly.  MathPlayer [1] (which my company distributes for free) has done that for years and works with IE.  It can be used with a large variety of assistive technology software[2].  It is by far the most widely used math accessibility tool out there.  The latest version (MathPlayer 3, public release 1) [3] allows for many options to customize the speech to the needs of the user and/or subject matter.  It allows for various styles of speech.  You could even write your own rules/speech if you don't like what MathPlayer does, although that is not easy (modifying/customizing existing rules is not hard though).  Using annotation-xml hard codes in speech and forces a "one size fits all" approach -- it seems the wrong way to go.


There are problem with the way speech engines speak math.  You can hear examples at [4].

By exploring some of the references, you should be able to get a better appreciation of what has already been done in this area.


Neil Soiffer
Senior Scientist 
Design Science, 
Inc.
www.dessci.com

~ Makers of MathType, MathFlow, MathPlayer, MathDaisy, Equation Editor 
~


[1]  http://www.dessci.com/en/products/mathplayer
[2]  http://www.dessci.com/en/solutions/access/atsupport.htm

[3]  http://news.dessci.com/2011/02/epub-3-first-public-draft-brings-enhanced-math-support-via-mathml.html

[4]  http://www.gh-mathspeak.com/tts.php


On Mon, Apr 16, 2012 at 7:15 AM, Adam Sobieski <adamsobieski@hotmail.com> wrote:






















































































Math Working Group, Greetings.  In the new Speech API Community Group, I indicated some synthesis and recognition topics pertaining to mathematical and scientific notation (http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0004.html):
 EPUB3-style (http://idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-ssml-attrib) SSML attributes:
 <math ssml:ph="..."> ... </math> SSML in <annotation-xml>:
 
<math>
<semantics>
...
<annotation-xml encoding="application/ssml+xml"> ... </annotation-xml>

</semantics>
</math> Some other related topics include referencing audio in <annotation>, interoperability with media fragment URI: 
<math>
<semantics>
...
<annotation encoding="audio/..." src="..." /></semantics>
</math> 
and speech synthesis interoperability with SMIL-based scenarios. An interesting speech synthesis feature is the automatic synthesis of mathematical and scientific content.  The MathAudio project (http://lpf-esi.fe.up.pt/~audiomath/index_en.html) illustrates processing the MathML presentation layer into Portuguese (http://lpf-esi.fe.up.pt/~hfilipe/projecto/mathml.html) (http://lpf-esi.fe.up.pt/~audiomath/links_en.html).
 Semantic content can additionally be of use as input for such processing and related topics include somehow extending or annotating content dictionaries with linguistic data or extending or annotating linguistic data formats with content dictionary data for extensibility in that regard.
 I also indicated the possibility of extending or more fully utilizing speech recognition grammar techniques (SRGS/SISR) for recognition output scenarios including XML, hypertext, and/or MathML.
 I wanted to apprise the Math Working Group about those new developments and to welcome discussion and any comments and suggestions about the synthesis of and recognition of speech containing mathematical and scientific formulas.
    Kind regards, Adam Sobieski
Received on Tuesday, 17 April 2012 23:31:33 UTC