TTS append / queueing from Dominic Mazzoni on 2012-04-24 (public-speech-api@w3.org from April 2012)

From: Dominic Mazzoni <dmazzoni@google.com>
Date: Tue, 24 Apr 2012 14:47:47 -0700
To: public-speech-api@w3.org
Message-ID: <CAFz-FYx2p_CRG_-RXL8OX2b6LgokxKszmEji4xOrg1K0zbD3SQ@mail.gmail.com>

Hi,

Let's pull the discussion of queueing / tts append into its own thread -
see below:

On Tue, Apr 24, 2012 at 8:36 AM, Hans Wennborg <hwennborg@google.com> wrote:

> On Fri, Apr 13, 2012 at 14:58, Jim Barnett <Jim.Barnett@genesyslab.com>
> wrote:
> > A couple of quick comments:
> > 1) The current TTS API sets the string to be played as a single action.
> > Could it be useful to add an 'append' function, allowing the programmer
> > to append text to the string that is already playing?  I'm thinking of
> > the case where the page wants to play out a large document, possibly one
> > that's being streamed from another source.  We'd have to handle the edge
> > cases, of course.  For example, I think that doing an append after the
> > play had stopped (when 'ended'==true) would simply set the text field
> > and a new start() would be required before play would resume.
>
> I think the web page could achieve the same effect as having an
> 'append' function by using the 'onend' handler on the TTS object, and
> start playing the new text as soon as the old playback finishes.
>

I think that the need is real, but I don't like the idea of an "append"
method, for a few reasons:
* It's mismatched to the rest of the API. To play the first utterance, you
create an object and set the text as an attribute, then call the play
method. To play the second one, you call the append method but pass the
text as an argument.
* It's not clear how you could change other properties and have the changes
apply to the appended utterance only. For example, what if I want to speak
the second utterance with a different voice, different language, or with
different parameters?

So rather than an append method, how about an option to each speech
utterance to either interrupt or enqueue? This is a pretty common feature
in most platform speech APIs already, so it'll be familiar to developers.
In the Chrome TTS extension API, for example, there's just a single boolean
flag "enqueue". If it's set to true, then the utterance you want to speak
will run after all currently running or enqueued utterances finish. If it's
set to false, everything else is flushed and the new utterance speaks
immediately.

So if I wanted to speak two things in sequence, it would look something
like this:

var tts1 = new TTS();
tts1.text = 'First';
tts1.lang = 'en';
tts1.play();

var tts2 = new TTS();
tts2.text = 'Deuxième';
tts2.lang = 'fr';
tts2.enqueue = true;    <-----------------------
tts2.play();

I believe that flag covers 90% of use cases, but if all else fails, you
should definitely be able to just add a handler that gets called when one
utterance finishes and implement your own queueing.

- Dominic

Received on Tuesday, 24 April 2012 21:48:16 UTC