Re: TTS proposal to split Utterance into its own interface from Dominic Mazzoni on 2012-09-13 (public-speech-api@w3.org from September 2012)

From: Dominic Mazzoni <dmazzoni@google.com>
Date: Thu, 13 Sep 2012 10:25:03 -0700
To: Glen Shires <gshires@google.com>
Cc: Hans Wennborg <hwennborg@google.com>, olli@pettay.fi, public-speech-api@w3.org
Message-ID: <CAFz-FYyy1BJPuOwdGeaRwQYdN9Zdq1k5_9ZozC3BqAwGB64ndA@mail.gmail.com>

Thanks for proposing definitions.

On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com> wrote:
> I propose the following definitions for the SpeechSynthesis IDL:
>
> SpeechSynthesis Attributes
>
> pending attribute:
> This attribute is true if the queue contains any utterances which have not
> completed playback.

I was imagining: This attribute is true if the queue contains any
utterances which have not *started* speaking.

> speaking attribute:
> This attribute is true if playback is in progress.

I don't like the word "playback", it doesn't fit when the speech is
generated dynamically. How about: This attribute is true if an
utterance is being spoken.

> paused attribute:
>   **** How is this different than (pending && !speaking) ? ****

This is true if the speech synthesis system is in a paused state,
independent of whether anything is speaking or queued.

paused && speaking -> it was paused in the middle of an utterance
paused && !speaking -> no utterance is speaking, but if you call
speak(), nothing will happen because it's in a paused state.

>
> SpeechSynthesis Methods
>
> The speak method
> This method appends the utterance to the end of a playback queue. If
> playback is not in progress, it also begins playback of the next item in the
> queue.

What do you think about rewriting to not use "playback"?

Also, my idea was that it would not begin playback if the system is in
a paused state.

> The cancel method
> This method removes the first matching utterance (if any) from the playback
> queue. If playback is in progress and the utterance removed is being played,
> playback ceases for the utterance and the next utterance in the queue (if
> any) begins playing.

Do we need to say "first matching"? Each utterance should be a
specific object, it should be either in the queue or not.

> The pause method
> This method pauses the playback mid-utterance. If playback is not in
> progress, it does nothing.

I was assuming that calling it would set the system into a paused
state, so that even a subsequent call to speak() would not do anything
other than enqueue.

> The continue method
> This method continues the playback at the point in the utterance and queue
> in which it was paused.  If playback is in progress, it does nothing.
>
> The stop method.
> This method stops playback mid-utterance and flushes the queue.
>
>
> SpeechSynthesisUtterance attributes
>
> text attribute:
> The text to be synthesized for this utterance. This attribute must not be
> changed after onstart fires.

I'd say: changes to this attribute after the utterance has been added
to the queue (by calling "speak") will be ignored. OR, we should make
it a DOM exception to modify it when it's in the speech queue.

> paused attribute:
> This attribute is true if this specific utterance is in the queue and has
> not completed playback.

I think this should only be true if it has begin speaking but not completed.

- Dominic

Received on Thursday, 13 September 2012 17:25:31 UTC