- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Thu, 10 Dec 2009 17:42:46 -0800
On 12/10/09 4:54 PM, Weston Ruter wrote: > I've been working on a web app which reads text in a web page, > highlighting each word as it is read. For this to be possible, a > Text-To-Speech API is needed which is able to: > (1) generate the speech audio from some text, and > (2) include the time indicies for when each of the words in the text is > spoken. > > Microsoft has its Sapi.SpVoice API via ActiveXObject which does (1) but > not (2) apparently. There are web services (usable in conjunction with > HTML5 Audio) which also do (1) such as the iSpeech API > <http://www.ispeech.org/api> and Google Translate's TTS > <http://translate.google.com/translate_tts?q=Hello%2C+World&tl=en > <http://translate.google.com/translate_tts?q=Hello%2C+World&tl=en>>, but > none that I have found which do (2). In any case, web services > aren't preferable since they require that the audio be transferred over > the network which could take a significant amount of time. > > Is anyone aware of any work done to develop a standard TTS API for the > Web? Operating systems already have this functionality built-in, and > it's a shame that web apps can't make use of it. If Google Gears were > alive, it would've been a good place to prototype this, but alas? You probably want to ask W3C multimodal working group. There are specifications like XHTML+Voice and SALT (neither really W3C specifications) and (old) proposals like MMI-CSS. -Olli
Received on Thursday, 10 December 2009 17:42:46 UTC