Re: TTS append / queueing from Charles Pritchard on 2012-04-25 (public-speech-api@w3.org from April 2012)

From: Charles Pritchard <chuck@jumis.com>
Date: Wed, 25 Apr 2012 12:35:00 -0700
To: Dominic Mazzoni <dmazzoni@google.com>
CC: Jim Barnett <Jim.Barnett@genesyslab.com>, Hans Wennborg <hwennborg@google.com>, public-speech-api@w3.org
Message-ID: <4F9851E4.1080007@jumis.com>

On 4/25/2012 7:57 AM, Dominic Mazzoni wrote:
> On Wed, Apr 25, 2012 at 7:32 AM, Jim Barnett 
> <Jim.Barnett@genesyslab.com <mailto:Jim.Barnett@genesyslab.com>> wrote:
>
>     I could imagine a situation in which a page invoked multiple
>     distinct TTS engines (expertise in different languages being one
>     common use case), so I wouldn't want the TTS object to be unique,
>     but I think it would make sense to have a single TTS object for
>     each engine and then have a method like  'addUtterance' with the
>     kind of behavior that Dominic mentioned (queue vs abort, plus the
>     possibility for different voices/parameters for each utterance.)
>
>
> I agree about wanting to use multiple engines, but why not just make 
> that a parameter? Unless you wanted two engines talking *at the same 
> time*, I don't see any reason you need a separate instance per engine.
>
> I can see it working where there's a single global TTS object and 
> everything is done via method calls. That's what we did for the Chrome 
> TTS extension API. I can also see it working to create one object per 
> utterance, because a typed JavaScript object is a convenient container 
> for state. But somewhere in-between (multiple TTS objects per engine, 
> but not one object per utterance) seems overcomplicated.

I'd like to pursue this from a different perspective. Let's think about 
speakers (as agents) instead of "text-to-speech".

var a = new Speaker({id: 1, title: 'Alice'});
var b = new Speaker({id: 1, title: 'Bob'});
var c = new Speaker({id: 2, title: 'Chuck'});

Now we've got utterance groups, and titles.

In many cases, the speaker won't have a name or group, because it's just 
a simple notification service between the app and the user.
But, we've got the other side of things, where it could be a chat room 
or a complex service.

Consider the following pseudo-code:
a.speak("Hello everyone");
b.speak("Hi!");
c.speak("Hello");

a.speak("And");
a.speak("bonjour", {lang: 'fr'});
a.speak("to our audience");
a.onword = function(e) {
     if(e.data=='audience') b.speak("Perhaps you mean captives", 
{instant: true});
};
c.speak("I can not speak because you cut my mic.");
c.onword = function(e) {
     if(e.data=='because') c.clear();
};

Chuck can't interrupt anyone, Bob interrupts Alice while she's speaking 
the word "audience".

The "queue" concept is still a little flaky in this example.

Having multiple objects is closer to the evolution of other APIs.
The BlobBuilder API has been moved over to an array literal-based "Blob" 
object instantiation.
So that's where I base some of the concept.

Callbacks on the object are much preferred over callbacks passed via 
argument.

-Charles

Received on Wednesday, 25 April 2012 19:35:25 UTC