RE: TTS append / queueing from Jim Barnett on 2012-04-24 (public-speech-api@w3.org from April 2012)

From: Jim Barnett <Jim.Barnett@genesyslab.com>
Date: Tue, 24 Apr 2012 15:27:36 -0700
To: "Dominic Mazzoni" <dmazzoni@google.com>, <public-speech-api@w3.org>
Message-ID: <E17CAD772E76C742B645BD4DC602CD810616EBD7@NAHALD.us.int.genesyslab.com>

So, just to be clear,  when you create multiple instances of the TTS object, they all represent the same underlying synthesis resource.  If so, then your proposal looks  reasonable to me.  I was thinking that the TTS object represented the underlying synthesizer, so I didn't think of creating a separate one.  It might  be good to make this explicit in the name of the object, but I don't have a good suggestion - maybe 'utterance' or something like that?

-          Jim

From: Dominic Mazzoni [mailto:dmazzoni@google.com] 
Sent: Tuesday, April 24, 2012 5:48 PM
To: public-speech-api@w3.org
Subject: TTS append / queueing

Hi,

Let's pull the discussion of queueing / tts append into its own thread - see below:

On Tue, Apr 24, 2012 at 8:36 AM, Hans Wennborg <hwennborg@google.com <mailto:hwennborg@google.com> > wrote:

	On Fri, Apr 13, 2012 at 14:58, Jim Barnett <Jim.Barnett@genesyslab.com <mailto:Jim.Barnett@genesyslab.com> > wrote:
	> A couple of quick comments:
	> 1) The current TTS API sets the string to be played as a single action.
	> Could it be useful to add an 'append' function, allowing the programmer
	> to append text to the string that is already playing?  I'm thinking of
	> the case where the page wants to play out a large document, possibly one
	> that's being streamed from another source.  We'd have to handle the edge
	> cases, of course.  For example, I think that doing an append after the
	> play had stopped (when 'ended'==true) would simply set the text field
	> and a new start() would be required before play would resume.

	I think the web page could achieve the same effect as having an
	'append' function by using the 'onend' handler on the TTS object, and
	start playing the new text as soon as the old playback finishes.

I think that the need is real, but I don't like the idea of an "append" method, for a few reasons:

* It's mismatched to the rest of the API. To play the first utterance, you create an object and set the text as an attribute, then call the play method. To play the second one, you call the append method but pass the text as an argument.

* It's not clear how you could change other properties and have the changes apply to the appended utterance only. For example, what if I want to speak the second utterance with a different voice, different language, or with different parameters?

So rather than an append method, how about an option to each speech utterance to either interrupt or enqueue? This is a pretty common feature in most platform speech APIs already, so it'll be familiar to developers. In the Chrome TTS extension API, for example, there's just a single boolean flag "enqueue". If it's set to true, then the utterance you want to speak will run after all currently running or enqueued utterances finish. If it's set to false, everything else is flushed and the new utterance speaks immediately.

So if I wanted to speak two things in sequence, it would look something like this:

var tts1 = new TTS();

tts1.text = 'First';

tts1.lang = 'en';

tts1.play();

var tts2 = new TTS();

tts2.text = 'Deuxième';

tts2.lang = 'fr';

tts2.enqueue = true;    <-----------------------

tts2.play();

I believe that flag covers 90% of use cases, but if all else fails, you should definitely be able to just add a handler that gets called when one utterance finishes and implement your own queueing.

- Dominic

Received on Tuesday, 24 April 2012 22:28:19 UTC