Re: TTS proposal to split Utterance into its own interface from Nagesh Kharidi on 2012-09-18 (public-speech-api@w3.org from September 2012)

From: Nagesh Kharidi <nagesh@openstream.com>
Date: Tue, 18 Sep 2012 14:26:19 -0400
To: Glen Shires <gshires@google.com>
Cc: Jim Barnett <Jim.Barnett@genesyslab.com>, Dominic Mazzoni <dmazzoni@google.com>, Hans Wennborg <hwennborg@google.com>, olli@pettay.fi, public-speech-api@w3.org
Message-Id: <3F7F63F5-786D-4B6A-ACFC-F1B96B68828B@openstream.com>
Glen,

Looks good. I propose that we enhance SpeechSynthesisUtterance by adding a continue event (fired when a paused utterance is resumed) and a corresponding oncontinue event handler.

Regards,
Nagesh

On Sep 18, 2012, at 1:25 AM, Glen Shires wrote:

> I've updated the spec with the above SpeechSynthesis and SpeechSynthesisUtterance IDL and definitions:
> https://dvcs.w3.org/hg/speech-api/rev/b036c78e9445
> 
> As always, the current draft spec is at:
> http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
> 
> On Sat, Sep 15, 2012 at 1:31 PM, Glen Shires <gshires@google.com> wrote:
> Nagesh,
> I agree that cancelAll() is useful and can make code simpler because it doesn't affect the paused state.  In fact, I propose that we add cancelAll() and remove stop() -- because the stop function is probably less common and can easily be accomplished with two calls: cancelAll() and pause().
> 
> Also, since canceling a specific utterance is not very useful, and questionable as Jerry states, I propose eliminating cancel(utterance). If we do that, then we could rename cancelAll() more simply as cancel().
> 
> Thus, I propose this IDL:
> 
>     interface SpeechSynthesis {
>       static readonly attribute boolean pending;
>       static readonly attribute boolean speaking;
>       static readonly attribute boolean paused;
> 
>       static void speak(SpeechSynthesisUtterance utterance);
>       static void cancel();
>       static void pause();
>       static void continue();
>     }
> 
> and I propose this new definition of cancel:
> 
> The cancel method
> This method removes all utterances from the queue. If an utterance is being spoken, speaking ceases immediately. This method does not change the paused state of the SpeechSynthesis object.
> 
> /Glen Shires
> 
> 
> On Sat, Sep 15, 2012 at 3:25 AM, Nagesh Kharidi <nagesh@openstream.com> wrote:
> Please see inline.
> 
> Regards,
> Nagesh
> 
> On Fri, 14 Sep 2012 12:59:41 -0700
>  Glen Shires <gshires@google.com> wrote:
> >> Provide the ability to cancel all currently queued utterances.
> >
> >The stop() method cancels all queued utterances. (Dominic proposed
> >that
> >this message be named stopAndFlushQueue(), would that name be more
> >clear?)
> 
> In addition to canceling all queued utterances, the stop() method also
> pauses the SpeechSynthesis object. A separate cancelAll() method would
> be useful, without which, if a new utterance is to be spoken
> immediately, we would have to do :
> speechSynthesis.stop();
> speechSynthesis.continue();
> speechSynthesis.speak(utterance);
> 
> With a cancelAll() method, this would be:
> speechSynthesis.cancelAll();
> speechSynthesis.speak(utterance);
> 
> Since this would be such a common usage, we could make it even easier
> for developers by either:
> - providing a speakImmediate(utterance) method that cancels all queued
> utterances and then starts speaking the new utterance
> or
> - adding a second parameter as follows to the speak() method:
> speechSynthesis.speak(utterance, speakImmediately);
> If speakImmediately is true, all currently queued utterances will be
> canceled and the new utterance will be spoken.
> 
> >
> >Also, what is the use case for the current cancel(utterance) method?
> > In
> >all the use cases I envision, you'd want to cancel all queued
> >utterances.
> >Can we eliminate cancel() ?
> 
> I also agree that canceling a specific utterance is not very useful.
> Canceling all queued utterances would be more common than canceling a
> specific utterance.
> 
> >
> >
> >> New speakNext SpeechSynthesis method - append the utterance to the
> >beginning of the queue
> >
> >I'd like more discussion on this. What are the use cases? What are the
> >edge
> >cases (e.g. If there's a race-condition, the current utterance may
> >finish
> >and the second in the queue may begin speaking before this new
> >utterance is
> >inserted).
> 
> Use case for speakNext() method: Consider a news application that plays
> the latest news items. It queues all news items to be played. Now if
> there is a new "breaking news" item that comes in, the speakNext()
> method can be used to play it as soon as possible without canceling the
> already queued items.
> 
> 
> >
> >
> >>  Question:  Can a cancelled utterance be re-queued?
> >
> >Good question, and also, what is the lifetime of a
> >SpeechSynthesisUtterance
> >object and who owns it. There's at least 3 possibilities:
> >
> >1. The speak() method takes ownership when it adds it to the queue,
> >then it
> >would presumably be destroyed upon cancel or onend.
> >    (This raises the questions: what usefulness is
> >the SpeechSynthesisUtterance object attribute "ended", since the
> >object
> >will be destroyed when it turns true. It also makes it messy to use
> >the
> >other readonly attributes because the object may be deleted suddenly.
> >Also, what if the author deletes the SpeechSynthesisUtterance object
> >prior
> >to it being spoken.  One easy way to accidentally create this bug is
> >to
> >define the SpeechSynthesisUtterance object in a method that goes out
> >of
> >scope.)
> >
> >2. The speak() method does not take ownership when it adds it directly
> >to
> >queue.
> >    (This raises the question: what if the author deletes the
> >SpeechSynthesisUtterance object prior to it being spoken.  One easy
> >way to
> >accidentally create this bug is to define the SpeechSynthesisUtterance
> >object in a method that goes out of scope.)
> >
> >3. The speak() method does not take ownership, it makes a copy of it
> >when
> >it adds it to queue .
> >    (This raises the question: how can the author's
> >original SpeechSynthesisUtterance object readonly attributes
> >(speaking,
> >paused, ended) reflect the state of the copy on the queue.)
> >
> >
> >To resolve these issues, I propose the following, because I think it's
> >the
> >cleanest solution and easiest for authors, since they can create and
> >destroy objects, and go out of scope, without worrying about the
> >speaking
> >queue timing:
> >
> >The speak() method does not take ownership of the
> >SpeechSynthesisUtterance
> >object, it makes a copy of it when it adds it to queue.  We eliminate
> >the SpeechSynthesisUtterance readonly attributes, relying instead on
> >events
> >that indicate change in state, including new events for: onpause,
> >onresume.
> >
> >Because it's a copy of the object, this clarifies that:
> >- changes to the original SpeechSynthesisUtterance object after
> >calling
> >speak() do not affect the copy on the queue.
> >- the same SpeechSynthesisUtterance object can be used to call speak()
> >multiple times, (even after a copy of which was spoken or cancelled).
> >
> >The new IDL would be:
> >
> >    interface SpeechSynthesisUtterance {
> >      attribute DOMString text;
> >      attribute DOMString lang;
> >      attribute DOMString serviceURI;
> >
> >      attribute Function onstart;
> >      attribute Function onend;
> >*      attribute Function onpause;*
> >*      attribute Function onresume;*
> >    }
> >
> >
> >And the new definition:
> >
> >The speak method
> >This method appends *a copy of* the utterance to the end of the queue
> >for
> >this SpeechSynthesis object. It does not change the paused state of
> >the
> >SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
> >remains paused. If it is not paused, then this utterance is spoken if
> >no
> >other utterances are in the queue, else this utterance is queued to
> >begin
> >speaking after the other utterances in the queue have been spoken.
> >
> >
> >/Glen Shires
> >
> >
> >On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
> ><Jim.Barnett@genesyslab.com>wrote:
> >
> >> I would  think that cancelling all utterances would be the more
> >common use
> >> case (so we ought to make it easy).  Question:  Can a cancelled
> >utterance
> >> be re-queued?
> >>
> >> - Jim
> >>
> >> -----Original Message-----
> >> From: Nagesh Kharidi [mailto:nagesh@openstream.com]
> >> Sent: Friday, September 14, 2012 8:58 AM
> >> To: Glen Shires; Dominic Mazzoni
> >> Cc: Hans Wennborg; olli@pettay.fi; public-speech-api@w3.org
> >> Subject: Re: TTS proposal to split Utterance into its own interface
> >>
> >> I would like to propose the following:
> >> 1. Provide the ability to cancel all currently queued utterances. A
> >new
> >> cancelAll method could be added. Alternately, invoking the cancel
> >method
> >> without the utterance parameter could imply cancel all utterances.
> >>
> >> 2. New speakNext SpeechSynthesis method
> >> This method will append the utterance to the beginning of the queue.
> >>
> >> 3. New oncancel SpeechSynthesisUtterance event Fired when the
> >utterance is
> >> canceled.
> >>
> >> 4. New canceled SpeechSynthesisUtterance attribute true if the
> >utterance
> >> is canceled.
> >>
> >>
> >> I also had a question regarding the stop method: Is "flushes the
> >queue"
> >> equivalent to calling cancel on all utterances in the queue? If so,
> >I
> >> would like to suggest changing "flushes the queue" to "cancels all
> >> utterances in the queue".
> >>
> >> Regards,
> >> Nagesh
> >>
> >> On Thu, 13 Sep 2012 14:13:56 -0700
> >>  Glen Shires <gshires@google.com> wrote:
> >> >Yes, I like the way you've defined the "speak" method to not change
> >the
> >> >play/pause state. Also, I didn't particularly like the word
> >"playback",
> >> >so thanks for the alternative "spoken".  Here's updated definitions
> >> >with your suggestions incorporated. If there's no disagreement,
> >I'll
> >> >add them to the spec on Monday.
> >> >
> >> >
> >> >SpeechSynthesis Attributes
> >> >
> >> >pending attribute:
> >> >This attribute is true if the queue for this SpeechSynthesis object
> >> >contains any utterances which have not started speaking.
> >> >
> >> >speaking attribute:
> >> >This attribute is true if an utterance is being spoken.
> >Specifically if
> >> >an utterance has begun being spoken and has not completed being
> >spoken,
> >> >and is independent of whether this SpeechSynthesis object is in the
> >> >paused state.
> >> >
> >> >paused attribute:
> >> >The attribute is true when this SpeechSynthesis object is in the
> >paused
> >> >state. This state is independent of whether anything is in the
> >queue.
> >> >The
> >> >default state of a new SpeechSynthesis object is the non-paused
> >state.
> >> >
> >> >
> >> >SpeechSynthesis Methods
> >> >
> >> >The speak method
> >> >This method appends the utterance to the end of the queue for this
> >> >SpeechSynthesis object. It does not change the paused state of the
> >> >SpeechSynthesis object.  If the SpeechSynthesis object is paused,
> >it
> >> >remains paused. If it is not paused, then this utterance is spoken
> >if
> >> >no other utterances are in the queue, else this utterance is queued
> >to
> >> >begin speaking after the other utterances in the queue have been
> >> >spoken.
> >> >
> >> >The cancel method
> >> >This method removes the specified utterance from the queue. If it
> >is
> >> >not in the queue, no changes are made. If the utterance removed is
> >> >being spoken, speaking ceases for that utterance and the next
> >utterance
> >> >in the queue (if
> >> >any) begins to be spoken. This method does not change the paused
> >state
> >> >of the SpeechSynthesis object.
> >> >
> >> >The pause method
> >> >This method puts the SpeechSynthesis object into the paused state.
> >If
> >> >an utterance was being spoken, it pauses mid-utterance. (If called
> >when
> >> >the SpeechSynthesis object was already in the paused state, it does
> >> >nothing.)
> >> >
> >> >The continue method
> >> >This method puts the SpeechSynthesis object into the non-paused
> >state.
> >> >If
> >> >an utterance was speaking (that is, its speaking attribute is
> >true), it
> >> >continues speaking the utterance at the point at which it was
> >paused,
> >> >else it begins speaking the next utterance in the queue (if any).
> >(If
> >> >called when the SpeechSynthesis object was already in the
> >non-paused
> >> >state, it does nothing.)
> >> >
> >> >The stop method.
> >> >This method puts the SpeechSynthesis object into the paused state
> >and
> >> >flushes the queue. It sets the speaking attribute to false and the
> >> >paused attribute to true.
> >> >
> >> >
> >> >SpeechSynthesisUtterance attributes
> >> >
> >> >
> >> >[[Note, I used SHOULD here because there may be some race-condition
> >> >edge-cases where it might not be ignored.]]
> >> >
> >> >text attribute:
> >> >The text to be synthesized for this utterance. Changes to this
> >> >attribute after the utterance has been added to the queue (by
> >calling
> >> >the speak
> >> >method) SHOULD be ignored.
> >> >
> >> >lang attribute:
> >> >[no change except to append the following] Changes to this
> >attribute
> >> >after the utterance has been added to the queue (by calling the
> >speak
> >> >method)
> >> >SHOULD be ignored.
> >> >
> >> >serviceURI attribute:
> >> >[no change except to append the following] Changes to this
> >attribute
> >> >after the utterance has been added to the queue (by calling the
> >speak
> >> >method)
> >> >SHOULD be ignored.
> >> >
> >> >speaking attribute:
> >> >This attribute is true if this specific utterance is currently
> >being
> >> >spoken. Specifically if this utterance has begun being spoken and
> >has
> >> >not completed being spoken. This is independent of whether the
> >> >SpeechSynthesis object is in a paused state.
> >> >
> >> >paused attribute:
> >> >This attribute is true if this specific utterance has begun to be
> >> >spoken, but has not completed and the SpeechSynthesis object is in
> >the
> >> >paused state.
> >> >
> >> >ended attribute:
> >> >This attribute is true if this specific utterance has completed
> >being
> >> >spoken.
> >> >
> >> >SpeechSynthesisUtterance events
> >> >
> >> >onstart event:
> >> >Fired when this utterance has begun to be spoken.
> >> >
> >> >onend event:
> >> >Fired when this utterance has completed being spoken.
> >> >
> >> >
> >> >
> >> >On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
> >> ><dmazzoni@google.com>wrote:
> >> >
> >> >> Thanks for proposing definitions.
> >> >>
> >> >> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com>
> >> >wrote:
> >> >> > I propose the following definitions for the SpeechSynthesis
> >IDL:
> >> >> >
> >> >> > SpeechSynthesis Attributes
> >> >> >
> >> >> > pending attribute:
> >> >> > This attribute is true if the queue contains any utterances
> >which
> >> >have
> >> >> not
> >> >> > completed playback.
> >> >>
> >> >> I was imagining: This attribute is true if the queue contains any
> >> >> utterances which have not *started* speaking.
> >> >>
> >> >> > speaking attribute:
> >> >> > This attribute is true if playback is in progress.
> >> >>
> >> >> I don't like the word "playback", it doesn't fit when the speech
> >is
> >> >> generated dynamically. How about: This attribute is true if an
> >> >> utterance is being spoken.
> >> >>
> >> >> > paused attribute:
> >> >> >   **** How is this different than (pending && !speaking) ? ****
> >> >>
> >> >> This is true if the speech synthesis system is in a paused state,
> >> >> independent of whether anything is speaking or queued.
> >> >>
> >> >> paused && speaking -> it was paused in the middle of an utterance
> >> >> paused && !speaking -> no utterance is speaking, but if you call
> >> >> speak(), nothing will happen because it's in a paused state.
> >> >>
> >> >> >
> >> >> > SpeechSynthesis Methods
> >> >> >
> >> >> > The speak method
> >> >> > This method appends the utterance to the end of a playback
> >queue.
> >> >If
> >> >> > playback is not in progress, it also begins playback of the
> >next
> >> >item in
> >> >> the
> >> >> > queue.
> >> >>
> >> >> What do you think about rewriting to not use "playback"?
> >> >>
> >> >> Also, my idea was that it would not begin playback if the system
> >is
> >> >in
> >> >> a paused state.
> >> >>
> >> >> > The cancel method
> >> >> > This method removes the first matching utterance (if any) from
> >the
> >> >> playback
> >> >> > queue. If playback is in progress and the utterance removed is
> >> >being
> >> >> played,
> >> >> > playback ceases for the utterance and the next utterance in the
> >> >queue (if
> >> >> > any) begins playing.
> >> >>
> >> >> Do we need to say "first matching"? Each utterance should be a
> >> >> specific object, it should be either in the queue or not.
> >> >>
> >> >> > The pause method
> >> >> > This method pauses the playback mid-utterance. If playback is
> >not
> >> >in
> >> >> > progress, it does nothing.
> >> >>
> >> >> I was assuming that calling it would set the system into a paused
> >> >> state, so that even a subsequent call to speak() would not do
> >> >anything
> >> >> other than enqueue.
> >> >>
> >> >> > The continue method
> >> >> > This method continues the playback at the point in the
> >utterance
> >> >and
> >> >> queue
> >> >> > in which it was paused.  If playback is in progress, it does
> >> >nothing.
> >> >> >
> >> >> > The stop method.
> >> >> > This method stops playback mid-utterance and flushes the queue.
> >> >> >
> >> >> >
> >> >> > SpeechSynthesisUtterance attributes
> >> >> >
> >> >> > text attribute:
> >> >> > The text to be synthesized for this utterance. This attribute
> >must
> >> >not be
> >> >> > changed after onstart fires.
> >> >>
> >> >> I'd say: changes to this attribute after the utterance has been
> >> >added
> >> >> to the queue (by calling "speak") will be ignored. OR, we should
> >> >make
> >> >> it a DOM exception to modify it when it's in the speech queue.
> >> >>
> >> >> > paused attribute:
> >> >> > This attribute is true if this specific utterance is in the
> >queue
> >> >and has
> >> >> > not completed playback.
> >> >>
> >> >> I think this should only be true if it has begin speaking but not
> >> >> completed.
> >> >>
> >> >> - Dominic
> >> >>
> >>
> >> --
> >> NOTICE TO RECIPIENT:
> >> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
> >TRANSMISSION,
> >> AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS
> >E-MAIL
> >> IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING
> >OF THIS
> >> E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE
> >ERROR
> >> BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> >THANK YOU
> >> IN ADVANCE FOR YOUR COOPERATION.
> >> Reply to : legal@openstream.com
> >>
> >>
> >>
> >>
> 
> --
> NOTICE TO RECIPIENT:
> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION.
> Reply to : legal@openstream.com
> 
> 
> 



--
NOTICE TO RECIPIENT:  THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION. Reply to : legal@openstream.com
Received on Tuesday, 18 September 2012 18:26:49 UTC