W3C home > Mailing lists > Public > public-speech-api@w3.org > September 2012

Re: TTS proposal to split Utterance into its own interface

From: Nagesh Kharidi <nagesh@openstream.com>
Date: Sat, 15 Sep 2012 06:25:30 -0400
To: "Glen Shires" <gshires@google.com>, "Jim Barnett" <Jim.Barnett@genesyslab.com>
Cc: "Dominic Mazzoni" <dmazzoni@google.com>, "Hans Wennborg" <hwennborg@google.com>, <olli@pettay.fi>, <public-speech-api@w3.org>
Message-ID: <web-1762642@smartmessenger.com>
Please see inline.

Regards,
Nagesh

On Fri, 14 Sep 2012 12:59:41 -0700
 Glen Shires <gshires@google.com> wrote:
>> Provide the ability to cancel all currently queued utterances.
>
>The stop() method cancels all queued utterances. (Dominic proposed
>that
>this message be named stopAndFlushQueue(), would that name be more
>clear?)

In addition to canceling all queued utterances, the stop() method also
pauses the SpeechSynthesis object. A separate cancelAll() method would
be useful, without which, if a new utterance is to be spoken
immediately, we would have to do :
speechSynthesis.stop();
speechSynthesis.continue();
speechSynthesis.speak(utterance);

With a cancelAll() method, this would be:
speechSynthesis.cancelAll();
speechSynthesis.speak(utterance);

Since this would be such a common usage, we could make it even easier
for developers by either:
- providing a speakImmediate(utterance) method that cancels all queued
utterances and then starts speaking the new utterance
or
- adding a second parameter as follows to the speak() method:
speechSynthesis.speak(utterance, speakImmediately);
If speakImmediately is true, all currently queued utterances will be
canceled and the new utterance will be spoken.

>
>Also, what is the use case for the current cancel(utterance) method?
> In
>all the use cases I envision, you'd want to cancel all queued
>utterances.
>Can we eliminate cancel() ?

I also agree that canceling a specific utterance is not very useful.
Canceling all queued utterances would be more common than canceling a
specific utterance. 

>
>
>> New speakNext SpeechSynthesis method - append the utterance to the
>beginning of the queue
>
>I'd like more discussion on this. What are the use cases? What are the
>edge
>cases (e.g. If there's a race-condition, the current utterance may
>finish
>and the second in the queue may begin speaking before this new
>utterance is
>inserted).

Use case for speakNext() method: Consider a news application that plays
the latest news items. It queues all news items to be played. Now if
there is a new "breaking news" item that comes in, the speakNext()
method can be used to play it as soon as possible without canceling the
already queued items.


>
>
>>  Question:  Can a cancelled utterance be re-queued?
>
>Good question, and also, what is the lifetime of a
>SpeechSynthesisUtterance
>object and who owns it. There's at least 3 possibilities:
>
>1. The speak() method takes ownership when it adds it to the queue,
>then it
>would presumably be destroyed upon cancel or onend.
>    (This raises the questions: what usefulness is
>the SpeechSynthesisUtterance object attribute "ended", since the
>object
>will be destroyed when it turns true. It also makes it messy to use
>the
>other readonly attributes because the object may be deleted suddenly.
>Also, what if the author deletes the SpeechSynthesisUtterance object
>prior
>to it being spoken.  One easy way to accidentally create this bug is
>to
>define the SpeechSynthesisUtterance object in a method that goes out
>of
>scope.)
>
>2. The speak() method does not take ownership when it adds it directly
>to
>queue.
>    (This raises the question: what if the author deletes the
>SpeechSynthesisUtterance object prior to it being spoken.  One easy
>way to
>accidentally create this bug is to define the SpeechSynthesisUtterance
>object in a method that goes out of scope.)
>
>3. The speak() method does not take ownership, it makes a copy of it
>when
>it adds it to queue .
>    (This raises the question: how can the author's
>original SpeechSynthesisUtterance object readonly attributes
>(speaking,
>paused, ended) reflect the state of the copy on the queue.)
>
>
>To resolve these issues, I propose the following, because I think it's
>the
>cleanest solution and easiest for authors, since they can create and
>destroy objects, and go out of scope, without worrying about the
>speaking
>queue timing:
>
>The speak() method does not take ownership of the
>SpeechSynthesisUtterance
>object, it makes a copy of it when it adds it to queue.  We eliminate
>the SpeechSynthesisUtterance readonly attributes, relying instead on
>events
>that indicate change in state, including new events for: onpause,
>onresume.
>
>Because it's a copy of the object, this clarifies that:
>- changes to the original SpeechSynthesisUtterance object after
>calling
>speak() do not affect the copy on the queue.
>- the same SpeechSynthesisUtterance object can be used to call speak()
>multiple times, (even after a copy of which was spoken or cancelled).
>
>The new IDL would be:
>
>    interface SpeechSynthesisUtterance {
>      attribute DOMString text;
>      attribute DOMString lang;
>      attribute DOMString serviceURI;
>
>      attribute Function onstart;
>      attribute Function onend;
>*      attribute Function onpause;*
>*      attribute Function onresume;*
>    }
>
>
>And the new definition:
>
>The speak method
>This method appends *a copy of* the utterance to the end of the queue
>for
>this SpeechSynthesis object. It does not change the paused state of
>the
>SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
>remains paused. If it is not paused, then this utterance is spoken if
>no
>other utterances are in the queue, else this utterance is queued to
>begin
>speaking after the other utterances in the queue have been spoken.
>
>
>/Glen Shires
>
>
>On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
><Jim.Barnett@genesyslab.com>wrote:
>
>> I would  think that cancelling all utterances would be the more
>common use
>> case (so we ought to make it easy).  Question:  Can a cancelled
>utterance
>> be re-queued?
>>
>> - Jim
>>
>> -----Original Message-----
>> From: Nagesh Kharidi [mailto:nagesh@openstream.com]
>> Sent: Friday, September 14, 2012 8:58 AM
>> To: Glen Shires; Dominic Mazzoni
>> Cc: Hans Wennborg; olli@pettay.fi; public-speech-api@w3.org
>> Subject: Re: TTS proposal to split Utterance into its own interface
>>
>> I would like to propose the following:
>> 1. Provide the ability to cancel all currently queued utterances. A
>new
>> cancelAll method could be added. Alternately, invoking the cancel
>method
>> without the utterance parameter could imply cancel all utterances.
>>
>> 2. New speakNext SpeechSynthesis method
>> This method will append the utterance to the beginning of the queue.
>>
>> 3. New oncancel SpeechSynthesisUtterance event Fired when the
>utterance is
>> canceled.
>>
>> 4. New canceled SpeechSynthesisUtterance attribute true if the
>utterance
>> is canceled.
>>
>>
>> I also had a question regarding the stop method: Is "flushes the
>queue"
>> equivalent to calling cancel on all utterances in the queue? If so,
>I
>> would like to suggest changing "flushes the queue" to "cancels all
>> utterances in the queue".
>>
>> Regards,
>> Nagesh
>>
>> On Thu, 13 Sep 2012 14:13:56 -0700
>>  Glen Shires <gshires@google.com> wrote:
>> >Yes, I like the way you've defined the "speak" method to not change
>the
>> >play/pause state. Also, I didn't particularly like the word
>"playback",
>> >so thanks for the alternative "spoken".  Here's updated definitions
>> >with your suggestions incorporated. If there's no disagreement,
>I'll
>> >add them to the spec on Monday.
>> >
>> >
>> >SpeechSynthesis Attributes
>> >
>> >pending attribute:
>> >This attribute is true if the queue for this SpeechSynthesis object
>> >contains any utterances which have not started speaking.
>> >
>> >speaking attribute:
>> >This attribute is true if an utterance is being spoken.
>Specifically if
>> >an utterance has begun being spoken and has not completed being
>spoken,
>> >and is independent of whether this SpeechSynthesis object is in the
>> >paused state.
>> >
>> >paused attribute:
>> >The attribute is true when this SpeechSynthesis object is in the
>paused
>> >state. This state is independent of whether anything is in the
>queue.
>> >The
>> >default state of a new SpeechSynthesis object is the non-paused
>state.
>> >
>> >
>> >SpeechSynthesis Methods
>> >
>> >The speak method
>> >This method appends the utterance to the end of the queue for this
>> >SpeechSynthesis object. It does not change the paused state of the
>> >SpeechSynthesis object.  If the SpeechSynthesis object is paused,
>it
>> >remains paused. If it is not paused, then this utterance is spoken
>if
>> >no other utterances are in the queue, else this utterance is queued
>to
>> >begin speaking after the other utterances in the queue have been
>> >spoken.
>> >
>> >The cancel method
>> >This method removes the specified utterance from the queue. If it
>is
>> >not in the queue, no changes are made. If the utterance removed is
>> >being spoken, speaking ceases for that utterance and the next
>utterance
>> >in the queue (if
>> >any) begins to be spoken. This method does not change the paused
>state
>> >of the SpeechSynthesis object.
>> >
>> >The pause method
>> >This method puts the SpeechSynthesis object into the paused state.
>If
>> >an utterance was being spoken, it pauses mid-utterance. (If called
>when
>> >the SpeechSynthesis object was already in the paused state, it does
>> >nothing.)
>> >
>> >The continue method
>> >This method puts the SpeechSynthesis object into the non-paused
>state.
>> >If
>> >an utterance was speaking (that is, its speaking attribute is
>true), it
>> >continues speaking the utterance at the point at which it was
>paused,
>> >else it begins speaking the next utterance in the queue (if any).
>(If
>> >called when the SpeechSynthesis object was already in the
>non-paused
>> >state, it does nothing.)
>> >
>> >The stop method.
>> >This method puts the SpeechSynthesis object into the paused state
>and
>> >flushes the queue. It sets the speaking attribute to false and the
>> >paused attribute to true.
>> >
>> >
>> >SpeechSynthesisUtterance attributes
>> >
>> >
>> >[[Note, I used SHOULD here because there may be some race-condition
>> >edge-cases where it might not be ignored.]]
>> >
>> >text attribute:
>> >The text to be synthesized for this utterance. Changes to this
>> >attribute after the utterance has been added to the queue (by
>calling
>> >the speak
>> >method) SHOULD be ignored.
>> >
>> >lang attribute:
>> >[no change except to append the following] Changes to this
>attribute
>> >after the utterance has been added to the queue (by calling the
>speak
>> >method)
>> >SHOULD be ignored.
>> >
>> >serviceURI attribute:
>> >[no change except to append the following] Changes to this
>attribute
>> >after the utterance has been added to the queue (by calling the
>speak
>> >method)
>> >SHOULD be ignored.
>> >
>> >speaking attribute:
>> >This attribute is true if this specific utterance is currently
>being
>> >spoken. Specifically if this utterance has begun being spoken and
>has
>> >not completed being spoken. This is independent of whether the
>> >SpeechSynthesis object is in a paused state.
>> >
>> >paused attribute:
>> >This attribute is true if this specific utterance has begun to be
>> >spoken, but has not completed and the SpeechSynthesis object is in
>the
>> >paused state.
>> >
>> >ended attribute:
>> >This attribute is true if this specific utterance has completed
>being
>> >spoken.
>> >
>> >SpeechSynthesisUtterance events
>> >
>> >onstart event:
>> >Fired when this utterance has begun to be spoken.
>> >
>> >onend event:
>> >Fired when this utterance has completed being spoken.
>> >
>> >
>> >
>> >On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
>> ><dmazzoni@google.com>wrote:
>> >
>> >> Thanks for proposing definitions.
>> >>
>> >> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com>
>> >wrote:
>> >> > I propose the following definitions for the SpeechSynthesis
>IDL:
>> >> >
>> >> > SpeechSynthesis Attributes
>> >> >
>> >> > pending attribute:
>> >> > This attribute is true if the queue contains any utterances
>which
>> >have
>> >> not
>> >> > completed playback.
>> >>
>> >> I was imagining: This attribute is true if the queue contains any
>> >> utterances which have not *started* speaking.
>> >>
>> >> > speaking attribute:
>> >> > This attribute is true if playback is in progress.
>> >>
>> >> I don't like the word "playback", it doesn't fit when the speech
>is
>> >> generated dynamically. How about: This attribute is true if an
>> >> utterance is being spoken.
>> >>
>> >> > paused attribute:
>> >> >   **** How is this different than (pending && !speaking) ? ****
>> >>
>> >> This is true if the speech synthesis system is in a paused state,
>> >> independent of whether anything is speaking or queued.
>> >>
>> >> paused && speaking -> it was paused in the middle of an utterance
>> >> paused && !speaking -> no utterance is speaking, but if you call
>> >> speak(), nothing will happen because it's in a paused state.
>> >>
>> >> >
>> >> > SpeechSynthesis Methods
>> >> >
>> >> > The speak method
>> >> > This method appends the utterance to the end of a playback
>queue.
>> >If
>> >> > playback is not in progress, it also begins playback of the
>next
>> >item in
>> >> the
>> >> > queue.
>> >>
>> >> What do you think about rewriting to not use "playback"?
>> >>
>> >> Also, my idea was that it would not begin playback if the system
>is
>> >in
>> >> a paused state.
>> >>
>> >> > The cancel method
>> >> > This method removes the first matching utterance (if any) from
>the
>> >> playback
>> >> > queue. If playback is in progress and the utterance removed is
>> >being
>> >> played,
>> >> > playback ceases for the utterance and the next utterance in the
>> >queue (if
>> >> > any) begins playing.
>> >>
>> >> Do we need to say "first matching"? Each utterance should be a
>> >> specific object, it should be either in the queue or not.
>> >>
>> >> > The pause method
>> >> > This method pauses the playback mid-utterance. If playback is
>not
>> >in
>> >> > progress, it does nothing.
>> >>
>> >> I was assuming that calling it would set the system into a paused
>> >> state, so that even a subsequent call to speak() would not do
>> >anything
>> >> other than enqueue.
>> >>
>> >> > The continue method
>> >> > This method continues the playback at the point in the
>utterance
>> >and
>> >> queue
>> >> > in which it was paused.  If playback is in progress, it does
>> >nothing.
>> >> >
>> >> > The stop method.
>> >> > This method stops playback mid-utterance and flushes the queue.
>> >> >
>> >> >
>> >> > SpeechSynthesisUtterance attributes
>> >> >
>> >> > text attribute:
>> >> > The text to be synthesized for this utterance. This attribute
>must
>> >not be
>> >> > changed after onstart fires.
>> >>
>> >> I'd say: changes to this attribute after the utterance has been
>> >added
>> >> to the queue (by calling "speak") will be ignored. OR, we should
>> >make
>> >> it a DOM exception to modify it when it's in the speech queue.
>> >>
>> >> > paused attribute:
>> >> > This attribute is true if this specific utterance is in the
>queue
>> >and has
>> >> > not completed playback.
>> >>
>> >> I think this should only be true if it has begin speaking but not
>> >> completed.
>> >>
>> >> - Dominic
>> >>
>>
>> --
>> NOTICE TO RECIPIENT:
>> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
>TRANSMISSION,
>> AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS
>E-MAIL
>> IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING
>OF THIS
>> E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE
>ERROR
>> BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>THANK YOU
>> IN ADVANCE FOR YOUR COOPERATION.
>> Reply to : legal@openstream.com
>>
>>
>>
>>

--
NOTICE TO RECIPIENT:  
THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION. 
Reply to : legal@openstream.com
Received on Saturday, 15 September 2012 10:09:30 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:02:28 UTC