W3C home > Mailing lists > Public > public-speech-api-contrib@w3.org > May 2012

Re: The Speech API needs support for event listeners for synthesis events

From: Dominic Mazzoni <dmazzoni@google.com>
Date: Thu, 3 May 2012 23:29:13 -0700
Message-ID: <CAFz-FYwXQ2TU52Be83ek=sJR8638ZxuXZX3evMTkucnjoD3Qtw@mail.gmail.com>
To: Gerardo Capiel <gerardoc@benetech.org>
Cc: "<public-speech-api-contrib@w3.org>" <public-speech-api-contrib@w3.org>
Hi Gerardo,

I'm glad you mentioned this issue. I also agree, the TTS API really needs
to include events that can be triggered at markers or before each word is
spoken. Not only are there a number of very important applications that
need this functionality, but most TTS engines and most other TTS APIs
already have this feature or something equivalent. Leaving it out severely
limits the potential applications and might hinder the adoption of web TTS.

In general, I think we should look carefully at both (1) the capabilities
provided by the majority of speech engines, and (2) the features available
in most other TTS APIs, when designing this API. When designing Chrome's
APIs, I looked at the native APIs in Windows and Mac OS X, and the external
interface exposed by several free and commercial speech engines. If a
feature was supported by almost all of them, I gave it high priority. If a
feature was desired by users but not currently supported by most platforms
or engines, I considered it low priority, because there'd be no point in
adding an API that's unlikely to be widely implementable in practice.

Here are some examples of features that I found to be widely supported:

* Enqueueing an utterance to be spoken as soon as the next one finishes
* Controlling the rate, pitch, and volume at a high level (without needing
markup)
* Getting callbacks when speech starts, finishes, and at markers and
between words

In contrast, I found these features to be desired by some application
developers but not widely available:

* Callbacks for phonemes
* Speak to a file or audio buffer
* Jump to a particular location in the output audio

In the middie:

* SSML. Relatively few engines support SSML natively, but the vast majority
support at least some markup. I've been working on translating SSML to
various engine-specific markup.

- Dominic

On Thu, May 3, 2012 at 4:53 PM, Gerardo Capiel <gerardoc@benetech.org>wrote:

>  I'm the VP of Engineering at Benetech, the nonprofit behind Bookshare (
> http://bookshare.org) - the world's largest library of accessible ebooks
> for people with print disabilities (e.g. blind, dyslexic, cerebral palsy).
>
> Over 70% of our 200K users have learning disabilities, such as dyslexia,
> and need synchronized highlighting of words as they are being spoken by a
> TTS engine.  We are planning to integrate the Google Chrome specific TTS
> APIs into the open source Readium (http://readium.org) EPUB 3 ebook
> reader to fulfill this use case in a web environment.
>
>  To validate market acceptance of this use case, below are examples of
> vendors to the dyslexic community, which have implemented this synchronized
> word-level highlighting capability in their applications:
>
>  Don Johnston: ReadOut:Loud -
> http://www.donjohnston.com/products/read_outloud/index.html
> Bookshare/Shinano: Read2Go - http://read2go.org/
> textHELP: Read&Write Gold -
> http://www.texthelp.com/North-America/our-products/readwrite
> Freedom Scientific: WYNN -
> http://www.freedomscientific.com/LSG/products/wynn_features.asp
> Levelware: InDAISY - http://levelware.com/
>
>  To implement such features in a web application, the TTS engine needs to
> be able to support JS based synthesized event handlers.  Google implemented
> a callback mechanism in their Chrome TTS APIs by supporting an event
> handler as part of the speak() method (
> http://code.google.com/chrome/extensions/tts.html#events).  The callbacks
> tied to synthesis events are ideally at the word level or triggered off
> SSML markers.
>
>  You can see some demo's of this capability at the following links:
>  https://github.com/gcapiel/ChromeWebAppBookshareReader/downloads (install
> extension in Chrome via .crx download)
> https://chrome.google.com/webstore/detail/chhkejkkcghanjclmhhpncachhgejoel (the
> FLITE voice supports callbacks, so install that first
> https://chrome.google.com/webstore/detail/edimkjalobeaakbgjdeikeimmacjdppn
> )
>
>  I would highly urge that the Speech API be extended with these
> capabilities, so that our dyslexic users are not limited to Google Chrome
> for web based reading and so that the general dyslexic community can
> benefit from this technology in other web based applications.
>
>  Sincerely,
>
>  Gerardo
>
>     Gerardo Capiel
> VP of Engineering, Benetech <http://benetech.org>
> 650-644-3405
> http://twitter.com/gcapiel
> Fork, Code, Do Social Good: http://benetech.github.com/
>
>
Received on Friday, 4 May 2012 06:29:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 4 May 2012 06:29:44 GMT