- From: Dominic Mazzoni <dmazzoni@google.com>
- Date: Thu, 3 May 2012 23:29:13 -0700
- To: Gerardo Capiel <gerardoc@benetech.org>
- Cc: "<public-speech-api-contrib@w3.org>" <public-speech-api-contrib@w3.org>
- Message-ID: <CAFz-FYwXQ2TU52Be83ek=sJR8638ZxuXZX3evMTkucnjoD3Qtw@mail.gmail.com>
Hi Gerardo, I'm glad you mentioned this issue. I also agree, the TTS API really needs to include events that can be triggered at markers or before each word is spoken. Not only are there a number of very important applications that need this functionality, but most TTS engines and most other TTS APIs already have this feature or something equivalent. Leaving it out severely limits the potential applications and might hinder the adoption of web TTS. In general, I think we should look carefully at both (1) the capabilities provided by the majority of speech engines, and (2) the features available in most other TTS APIs, when designing this API. When designing Chrome's APIs, I looked at the native APIs in Windows and Mac OS X, and the external interface exposed by several free and commercial speech engines. If a feature was supported by almost all of them, I gave it high priority. If a feature was desired by users but not currently supported by most platforms or engines, I considered it low priority, because there'd be no point in adding an API that's unlikely to be widely implementable in practice. Here are some examples of features that I found to be widely supported: * Enqueueing an utterance to be spoken as soon as the next one finishes * Controlling the rate, pitch, and volume at a high level (without needing markup) * Getting callbacks when speech starts, finishes, and at markers and between words In contrast, I found these features to be desired by some application developers but not widely available: * Callbacks for phonemes * Speak to a file or audio buffer * Jump to a particular location in the output audio In the middie: * SSML. Relatively few engines support SSML natively, but the vast majority support at least some markup. I've been working on translating SSML to various engine-specific markup. - Dominic On Thu, May 3, 2012 at 4:53 PM, Gerardo Capiel <gerardoc@benetech.org>wrote: > I'm the VP of Engineering at Benetech, the nonprofit behind Bookshare ( > http://bookshare.org) - the world's largest library of accessible ebooks > for people with print disabilities (e.g. blind, dyslexic, cerebral palsy). > > Over 70% of our 200K users have learning disabilities, such as dyslexia, > and need synchronized highlighting of words as they are being spoken by a > TTS engine. We are planning to integrate the Google Chrome specific TTS > APIs into the open source Readium (http://readium.org) EPUB 3 ebook > reader to fulfill this use case in a web environment. > > To validate market acceptance of this use case, below are examples of > vendors to the dyslexic community, which have implemented this synchronized > word-level highlighting capability in their applications: > > Don Johnston: ReadOut:Loud - > http://www.donjohnston.com/products/read_outloud/index.html > Bookshare/Shinano: Read2Go - http://read2go.org/ > textHELP: Read&Write Gold - > http://www.texthelp.com/North-America/our-products/readwrite > Freedom Scientific: WYNN - > http://www.freedomscientific.com/LSG/products/wynn_features.asp > Levelware: InDAISY - http://levelware.com/ > > To implement such features in a web application, the TTS engine needs to > be able to support JS based synthesized event handlers. Google implemented > a callback mechanism in their Chrome TTS APIs by supporting an event > handler as part of the speak() method ( > http://code.google.com/chrome/extensions/tts.html#events). The callbacks > tied to synthesis events are ideally at the word level or triggered off > SSML markers. > > You can see some demo's of this capability at the following links: > https://github.com/gcapiel/ChromeWebAppBookshareReader/downloads (install > extension in Chrome via .crx download) > https://chrome.google.com/webstore/detail/chhkejkkcghanjclmhhpncachhgejoel (the > FLITE voice supports callbacks, so install that first > https://chrome.google.com/webstore/detail/edimkjalobeaakbgjdeikeimmacjdppn > ) > > I would highly urge that the Speech API be extended with these > capabilities, so that our dyslexic users are not limited to Google Chrome > for web based reading and so that the general dyslexic community can > benefit from this technology in other web based applications. > > Sincerely, > > Gerardo > > Gerardo Capiel > VP of Engineering, Benetech <http://benetech.org> > 650-644-3405 > http://twitter.com/gcapiel > Fork, Code, Do Social Good: http://benetech.github.com/ > >
Received on Friday, 4 May 2012 06:29:43 UTC