RE: TTS proposal to split Utterance into its own interface from Jim Barnett on 2012-09-14 (public-speech-api@w3.org from September 2012)

From: Jim Barnett <Jim.Barnett@genesyslab.com>
Date: Fri, 14 Sep 2012 13:15:33 -0700
To: "Glen Shires" <gshires@google.com>
Cc: "Nagesh Kharidi" <nagesh@openstream.com>, "Dominic Mazzoni" <dmazzoni@google.com>, "Hans Wennborg" <hwennborg@google.com>, <olli@pettay.fi>, <public-speech-api@w3.org>
Message-ID: <E17CAD772E76C742B645BD4DC602CD8106B5F695@NAHALD.us.int.genesyslab.com>
I agree that placing an utterance at the beginning of the queue seems
odd.  I think it would be cleaner to simply call stop() and then requeue
in the desired order.  

 

If speak() makes a copy of the object,  then cancel(utterance) becomes
questionable, since it's a copy of  'utterance' that is queued.  In
principle, code has no way of touching the utterance in queue since it's
a separate object.  

 

-          Jim

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Friday, September 14, 2012 4:00 PM
To: Jim Barnett
Cc: Nagesh Kharidi; Dominic Mazzoni; Hans Wennborg; olli@pettay.fi;
public-speech-api@w3.org
Subject: Re: TTS proposal to split Utterance into its own interface

 

> Provide the ability to cancel all currently queued utterances.

 

The stop() method cancels all queued utterances. (Dominic proposed that
this message be named stopAndFlushQueue(), would that name be more
clear?)

 

Also, what is the use case for the current cancel(utterance) method?  In
all the use cases I envision, you'd want to cancel all queued
utterances. Can we eliminate cancel() ?

 

 

> New speakNext SpeechSynthesis method - append the utterance to the
beginning of the queue

 

I'd like more discussion on this. What are the use cases? What are the
edge cases (e.g. If there's a race-condition, the current utterance may
finish and the second in the queue may begin speaking before this new
utterance is inserted).

 

 

>  Question:  Can a cancelled utterance be re-queued?

 

Good question, and also, what is the lifetime of a
SpeechSynthesisUtterance object and who owns it. There's at least 3
possibilities:

 

1. The speak() method takes ownership when it adds it to the queue, then
it would presumably be destroyed upon cancel or onend.

    (This raises the questions: what usefulness is the
SpeechSynthesisUtterance object attribute "ended", since the object will
be destroyed when it turns true. It also makes it messy to use the other
readonly attributes because the object may be deleted suddenly. Also,
what if the author deletes the SpeechSynthesisUtterance object prior to
it being spoken.  One easy way to accidentally create this bug is to
define the SpeechSynthesisUtterance object in a method that goes out of
scope.)

 

2. The speak() method does not take ownership when it adds it directly
to queue. 

    (This raises the question: what if the author deletes the
SpeechSynthesisUtterance object prior to it being spoken.  One easy way
to accidentally create this bug is to define the
SpeechSynthesisUtterance object in a method that goes out of scope.)

 

3. The speak() method does not take ownership, it makes a copy of it
when it adds it to queue .

    (This raises the question: how can the author's original
SpeechSynthesisUtterance object readonly attributes (speaking, paused,
ended) reflect the state of the copy on the queue.)

 

 

To resolve these issues, I propose the following, because I think it's
the cleanest solution and easiest for authors, since they can create and
destroy objects, and go out of scope, without worrying about the
speaking queue timing:

 

The speak() method does not take ownership of the
SpeechSynthesisUtterance object, it makes a copy of it when it adds it
to queue.  We eliminate the SpeechSynthesisUtterance readonly
attributes, relying instead on events that indicate change in state,
including new events for: onpause, onresume.

 

Because it's a copy of the object, this clarifies that:

- changes to the original SpeechSynthesisUtterance object after calling
speak() do not affect the copy on the queue.

- the same SpeechSynthesisUtterance object can be used to call speak()
multiple times, (even after a copy of which was spoken or cancelled).

 

The new IDL would be:

 

    interface SpeechSynthesisUtterance {

      attribute DOMString text;

      attribute DOMString lang;

      attribute DOMString serviceURI;

 

      attribute Function onstart;

      attribute Function onend;

      attribute Function onpause;

      attribute Function onresume;

    }

 

 

And the new definition:

 

The speak method

This method appends a copy of the utterance to the end of the queue for
this SpeechSynthesis object. It does not change the paused state of the
SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
remains paused. If it is not paused, then this utterance is spoken if no
other utterances are in the queue, else this utterance is queued to
begin speaking after the other utterances in the queue have been spoken.

 

 

/Glen Shires

 

 

On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
<Jim.Barnett@genesyslab.com> wrote:

I would  think that cancelling all utterances would be the more common
use case (so we ought to make it easy).  Question:  Can a cancelled
utterance be re-queued?

- Jim


-----Original Message-----
From: Nagesh Kharidi [mailto:nagesh@openstream.com]
Sent: Friday, September 14, 2012 8:58 AM
To: Glen Shires; Dominic Mazzoni
Cc: Hans Wennborg; olli@pettay.fi; public-speech-api@w3.org
Subject: Re: TTS proposal to split Utterance into its own interface

I would like to propose the following:
1. Provide the ability to cancel all currently queued utterances. A new
cancelAll method could be added. Alternately, invoking the cancel method
without the utterance parameter could imply cancel all utterances.

2. New speakNext SpeechSynthesis method
This method will append the utterance to the beginning of the queue.

3. New oncancel SpeechSynthesisUtterance event Fired when the utterance
is canceled.

4. New canceled SpeechSynthesisUtterance attribute true if the utterance
is canceled.


I also had a question regarding the stop method: Is "flushes the queue"
equivalent to calling cancel on all utterances in the queue? If so, I
would like to suggest changing "flushes the queue" to "cancels all
utterances in the queue".

Regards,
Nagesh

On Thu, 13 Sep 2012 14:13:56 -0700
 Glen Shires <gshires@google.com> wrote:
>Yes, I like the way you've defined the "speak" method to not change the
>play/pause state. Also, I didn't particularly like the word "playback",
>so thanks for the alternative "spoken".  Here's updated definitions
>with your suggestions incorporated. If there's no disagreement, I'll
>add them to the spec on Monday.
>
>
>SpeechSynthesis Attributes
>
>pending attribute:
>This attribute is true if the queue for this SpeechSynthesis object
>contains any utterances which have not started speaking.
>
>speaking attribute:
>This attribute is true if an utterance is being spoken. Specifically if
>an utterance has begun being spoken and has not completed being spoken,
>and is independent of whether this SpeechSynthesis object is in the
>paused state.
>
>paused attribute:
>The attribute is true when this SpeechSynthesis object is in the paused
>state. This state is independent of whether anything is in the queue.
>The
>default state of a new SpeechSynthesis object is the non-paused state.
>
>
>SpeechSynthesis Methods
>
>The speak method
>This method appends the utterance to the end of the queue for this
>SpeechSynthesis object. It does not change the paused state of the
>SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
>remains paused. If it is not paused, then this utterance is spoken if
>no other utterances are in the queue, else this utterance is queued to
>begin speaking after the other utterances in the queue have been
>spoken.
>
>The cancel method
>This method removes the specified utterance from the queue. If it is
>not in the queue, no changes are made. If the utterance removed is
>being spoken, speaking ceases for that utterance and the next utterance
>in the queue (if
>any) begins to be spoken. This method does not change the paused state
>of the SpeechSynthesis object.
>
>The pause method
>This method puts the SpeechSynthesis object into the paused state. If
>an utterance was being spoken, it pauses mid-utterance. (If called when
>the SpeechSynthesis object was already in the paused state, it does
>nothing.)
>
>The continue method
>This method puts the SpeechSynthesis object into the non-paused state.
>If
>an utterance was speaking (that is, its speaking attribute is true), it
>continues speaking the utterance at the point at which it was paused,
>else it begins speaking the next utterance in the queue (if any). (If
>called when the SpeechSynthesis object was already in the non-paused
>state, it does nothing.)
>
>The stop method.
>This method puts the SpeechSynthesis object into the paused state and
>flushes the queue. It sets the speaking attribute to false and the
>paused attribute to true.
>
>
>SpeechSynthesisUtterance attributes
>
>
>[[Note, I used SHOULD here because there may be some race-condition
>edge-cases where it might not be ignored.]]
>
>text attribute:
>The text to be synthesized for this utterance. Changes to this
>attribute after the utterance has been added to the queue (by calling
>the speak
>method) SHOULD be ignored.
>
>lang attribute:
>[no change except to append the following] Changes to this attribute
>after the utterance has been added to the queue (by calling the speak
>method)
>SHOULD be ignored.
>
>serviceURI attribute:
>[no change except to append the following] Changes to this attribute
>after the utterance has been added to the queue (by calling the speak
>method)
>SHOULD be ignored.
>
>speaking attribute:
>This attribute is true if this specific utterance is currently being
>spoken. Specifically if this utterance has begun being spoken and has
>not completed being spoken. This is independent of whether the
>SpeechSynthesis object is in a paused state.
>
>paused attribute:
>This attribute is true if this specific utterance has begun to be
>spoken, but has not completed and the SpeechSynthesis object is in the
>paused state.
>
>ended attribute:
>This attribute is true if this specific utterance has completed being
>spoken.
>
>SpeechSynthesisUtterance events
>
>onstart event:
>Fired when this utterance has begun to be spoken.
>
>onend event:
>Fired when this utterance has completed being spoken.
>
>
>
>On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
><dmazzoni@google.com>wrote:
>
>> Thanks for proposing definitions.
>>
>> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com>
>wrote:
>> > I propose the following definitions for the SpeechSynthesis IDL:
>> >
>> > SpeechSynthesis Attributes
>> >
>> > pending attribute:
>> > This attribute is true if the queue contains any utterances which
>have
>> not
>> > completed playback.
>>
>> I was imagining: This attribute is true if the queue contains any
>> utterances which have not *started* speaking.
>>
>> > speaking attribute:
>> > This attribute is true if playback is in progress.
>>
>> I don't like the word "playback", it doesn't fit when the speech is
>> generated dynamically. How about: This attribute is true if an
>> utterance is being spoken.
>>
>> > paused attribute:
>> >   **** How is this different than (pending && !speaking) ? ****
>>
>> This is true if the speech synthesis system is in a paused state,
>> independent of whether anything is speaking or queued.
>>
>> paused && speaking -> it was paused in the middle of an utterance
>> paused && !speaking -> no utterance is speaking, but if you call
>> speak(), nothing will happen because it's in a paused state.
>>
>> >
>> > SpeechSynthesis Methods
>> >
>> > The speak method
>> > This method appends the utterance to the end of a playback queue.
>If
>> > playback is not in progress, it also begins playback of the next
>item in
>> the
>> > queue.
>>
>> What do you think about rewriting to not use "playback"?
>>
>> Also, my idea was that it would not begin playback if the system is
>in
>> a paused state.
>>
>> > The cancel method
>> > This method removes the first matching utterance (if any) from the
>> playback
>> > queue. If playback is in progress and the utterance removed is
>being
>> played,
>> > playback ceases for the utterance and the next utterance in the
>queue (if
>> > any) begins playing.
>>
>> Do we need to say "first matching"? Each utterance should be a
>> specific object, it should be either in the queue or not.
>>
>> > The pause method
>> > This method pauses the playback mid-utterance. If playback is not
>in
>> > progress, it does nothing.
>>
>> I was assuming that calling it would set the system into a paused
>> state, so that even a subsequent call to speak() would not do
>anything
>> other than enqueue.
>>
>> > The continue method
>> > This method continues the playback at the point in the utterance
>and
>> queue
>> > in which it was paused.  If playback is in progress, it does
>nothing.
>> >
>> > The stop method.
>> > This method stops playback mid-utterance and flushes the queue.
>> >
>> >
>> > SpeechSynthesisUtterance attributes
>> >
>> > text attribute:
>> > The text to be synthesized for this utterance. This attribute must
>not be
>> > changed after onstart fires.
>>
>> I'd say: changes to this attribute after the utterance has been
>added
>> to the queue (by calling "speak") will be ignored. OR, we should
>make
>> it a DOM exception to modify it when it's in the speech queue.
>>
>> > paused attribute:
>> > This attribute is true if this specific utterance is in the queue
>and has
>> > not completed playback.
>>
>> I think this should only be true if it has begin speaking but not
>> completed.
>>
>> - Dominic
>>

--
NOTICE TO RECIPIENT:
THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU
RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION,
DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE
NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE
THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR
COOPERATION.
Reply to : legal@openstream.com
Received on Friday, 14 September 2012 20:15:09 UTC