W3C home > Mailing lists > Public > public-speech-api@w3.org > September 2012

Re: TTS proposal to split Utterance into its own interface

From: Glen Shires <gshires@google.com>
Date: Mon, 17 Sep 2012 22:25:42 -0700
Message-ID: <CAEE5bcig0+Na_h0ZSYs6Xvj1j4Y1ezqoFfOh0qr8ieuvyaCJaQ@mail.gmail.com>
To: Nagesh Kharidi <nagesh@openstream.com>
Cc: Jim Barnett <Jim.Barnett@genesyslab.com>, Dominic Mazzoni <dmazzoni@google.com>, Hans Wennborg <hwennborg@google.com>, olli@pettay.fi, public-speech-api@w3.org
I've updated the spec with the above SpeechSynthesis and
SpeechSynthesisUtterance IDL and definitions:
https://dvcs.w3.org/hg/speech-api/rev/b036c78e9445

As always, the current draft spec is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

On Sat, Sep 15, 2012 at 1:31 PM, Glen Shires <gshires@google.com> wrote:

> Nagesh,
> I agree that cancelAll() is useful and can make code simpler because it
> doesn't affect the paused state.  In fact, I propose that we add
> cancelAll() and remove stop() -- because the stop function is probably less
> common and can easily be accomplished with two calls: cancelAll() and
> pause().
>
> Also, since canceling a specific utterance is not very useful, and questionable
> as Jerry states, I propose eliminating cancel(utterance). If we do that,
> then we could rename cancelAll() more simply as cancel().
>
> Thus, I propose this IDL:
>
>     interface SpeechSynthesis {
>       static readonly attribute boolean pending;
>        static readonly attribute boolean speaking;
>       static readonly attribute boolean paused;
>
>       static void speak(SpeechSynthesisUtterance utterance);
>       static void cancel();
>        static void pause();
>       static void continue();
>     }
>
> and I propose this new definition of cancel:
>
> The cancel method
> This method removes all utterances from the queue. If an utterance is
> being spoken, speaking ceases immediately. This method does not change the
> paused state of the SpeechSynthesis object.
>
>  /Glen Shires
>
>
> On Sat, Sep 15, 2012 at 3:25 AM, Nagesh Kharidi <nagesh@openstream.com>wrote:
>
>> Please see inline.
>>
>> Regards,
>> Nagesh
>>
>> On Fri, 14 Sep 2012 12:59:41 -0700
>>  Glen Shires <gshires@google.com> wrote:
>> >> Provide the ability to cancel all currently queued utterances.
>> >
>> >The stop() method cancels all queued utterances. (Dominic proposed
>> >that
>> >this message be named stopAndFlushQueue(), would that name be more
>> >clear?)
>>
>> In addition to canceling all queued utterances, the stop() method also
>> pauses the SpeechSynthesis object. A separate cancelAll() method would
>> be useful, without which, if a new utterance is to be spoken
>> immediately, we would have to do :
>> speechSynthesis.stop();
>> speechSynthesis.continue();
>> speechSynthesis.speak(utterance);
>>
>> With a cancelAll() method, this would be:
>> speechSynthesis.cancelAll();
>> speechSynthesis.speak(utterance);
>>
>> Since this would be such a common usage, we could make it even easier
>> for developers by either:
>> - providing a speakImmediate(utterance) method that cancels all queued
>> utterances and then starts speaking the new utterance
>> or
>> - adding a second parameter as follows to the speak() method:
>> speechSynthesis.speak(utterance, speakImmediately);
>> If speakImmediately is true, all currently queued utterances will be
>> canceled and the new utterance will be spoken.
>>
>> >
>> >Also, what is the use case for the current cancel(utterance) method?
>> > In
>> >all the use cases I envision, you'd want to cancel all queued
>> >utterances.
>> >Can we eliminate cancel() ?
>>
>> I also agree that canceling a specific utterance is not very useful.
>> Canceling all queued utterances would be more common than canceling a
>> specific utterance.
>>
>> >
>> >
>> >> New speakNext SpeechSynthesis method - append the utterance to the
>> >beginning of the queue
>> >
>> >I'd like more discussion on this. What are the use cases? What are the
>> >edge
>> >cases (e.g. If there's a race-condition, the current utterance may
>> >finish
>> >and the second in the queue may begin speaking before this new
>> >utterance is
>> >inserted).
>>
>> Use case for speakNext() method: Consider a news application that plays
>> the latest news items. It queues all news items to be played. Now if
>> there is a new "breaking news" item that comes in, the speakNext()
>> method can be used to play it as soon as possible without canceling the
>> already queued items.
>>
>>
>> >
>> >
>> >>  Question:  Can a cancelled utterance be re-queued?
>> >
>> >Good question, and also, what is the lifetime of a
>> >SpeechSynthesisUtterance
>> >object and who owns it. There's at least 3 possibilities:
>> >
>> >1. The speak() method takes ownership when it adds it to the queue,
>> >then it
>> >would presumably be destroyed upon cancel or onend.
>> >    (This raises the questions: what usefulness is
>> >the SpeechSynthesisUtterance object attribute "ended", since the
>> >object
>> >will be destroyed when it turns true. It also makes it messy to use
>> >the
>> >other readonly attributes because the object may be deleted suddenly.
>> >Also, what if the author deletes the SpeechSynthesisUtterance object
>> >prior
>> >to it being spoken.  One easy way to accidentally create this bug is
>> >to
>> >define the SpeechSynthesisUtterance object in a method that goes out
>> >of
>> >scope.)
>> >
>> >2. The speak() method does not take ownership when it adds it directly
>> >to
>> >queue.
>> >    (This raises the question: what if the author deletes the
>> >SpeechSynthesisUtterance object prior to it being spoken.  One easy
>> >way to
>> >accidentally create this bug is to define the SpeechSynthesisUtterance
>> >object in a method that goes out of scope.)
>> >
>> >3. The speak() method does not take ownership, it makes a copy of it
>> >when
>> >it adds it to queue .
>> >    (This raises the question: how can the author's
>> >original SpeechSynthesisUtterance object readonly attributes
>> >(speaking,
>> >paused, ended) reflect the state of the copy on the queue.)
>> >
>> >
>> >To resolve these issues, I propose the following, because I think it's
>> >the
>> >cleanest solution and easiest for authors, since they can create and
>> >destroy objects, and go out of scope, without worrying about the
>> >speaking
>> >queue timing:
>> >
>> >The speak() method does not take ownership of the
>> >SpeechSynthesisUtterance
>> >object, it makes a copy of it when it adds it to queue.  We eliminate
>> >the SpeechSynthesisUtterance readonly attributes, relying instead on
>> >events
>> >that indicate change in state, including new events for: onpause,
>> >onresume.
>> >
>> >Because it's a copy of the object, this clarifies that:
>> >- changes to the original SpeechSynthesisUtterance object after
>> >calling
>> >speak() do not affect the copy on the queue.
>> >- the same SpeechSynthesisUtterance object can be used to call speak()
>> >multiple times, (even after a copy of which was spoken or cancelled).
>> >
>> >The new IDL would be:
>> >
>> >    interface SpeechSynthesisUtterance {
>> >      attribute DOMString text;
>> >      attribute DOMString lang;
>> >      attribute DOMString serviceURI;
>> >
>> >      attribute Function onstart;
>> >      attribute Function onend;
>> >*      attribute Function onpause;*
>> >*      attribute Function onresume;*
>> >    }
>> >
>> >
>> >And the new definition:
>> >
>> >The speak method
>> >This method appends *a copy of* the utterance to the end of the queue
>> >for
>> >this SpeechSynthesis object. It does not change the paused state of
>> >the
>> >SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
>> >remains paused. If it is not paused, then this utterance is spoken if
>> >no
>> >other utterances are in the queue, else this utterance is queued to
>> >begin
>> >speaking after the other utterances in the queue have been spoken.
>> >
>> >
>> >/Glen Shires
>> >
>> >
>> >On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
>> ><Jim.Barnett@genesyslab.com>wrote:
>> >
>> >> I would  think that cancelling all utterances would be the more
>> >common use
>> >> case (so we ought to make it easy).  Question:  Can a cancelled
>> >utterance
>> >> be re-queued?
>> >>
>> >> - Jim
>> >>
>> >> -----Original Message-----
>> >> From: Nagesh Kharidi [mailto:nagesh@openstream.com]
>> >> Sent: Friday, September 14, 2012 8:58 AM
>> >> To: Glen Shires; Dominic Mazzoni
>> >> Cc: Hans Wennborg; olli@pettay.fi; public-speech-api@w3.org
>> >> Subject: Re: TTS proposal to split Utterance into its own interface
>> >>
>> >> I would like to propose the following:
>> >> 1. Provide the ability to cancel all currently queued utterances. A
>> >new
>> >> cancelAll method could be added. Alternately, invoking the cancel
>> >method
>> >> without the utterance parameter could imply cancel all utterances.
>> >>
>> >> 2. New speakNext SpeechSynthesis method
>> >> This method will append the utterance to the beginning of the queue.
>> >>
>> >> 3. New oncancel SpeechSynthesisUtterance event Fired when the
>> >utterance is
>> >> canceled.
>> >>
>> >> 4. New canceled SpeechSynthesisUtterance attribute true if the
>> >utterance
>> >> is canceled.
>> >>
>> >>
>> >> I also had a question regarding the stop method: Is "flushes the
>> >queue"
>> >> equivalent to calling cancel on all utterances in the queue? If so,
>> >I
>> >> would like to suggest changing "flushes the queue" to "cancels all
>> >> utterances in the queue".
>> >>
>> >> Regards,
>> >> Nagesh
>> >>
>> >> On Thu, 13 Sep 2012 14:13:56 -0700
>> >>  Glen Shires <gshires@google.com> wrote:
>> >> >Yes, I like the way you've defined the "speak" method to not change
>> >the
>> >> >play/pause state. Also, I didn't particularly like the word
>> >"playback",
>> >> >so thanks for the alternative "spoken".  Here's updated definitions
>> >> >with your suggestions incorporated. If there's no disagreement,
>> >I'll
>> >> >add them to the spec on Monday.
>> >> >
>> >> >
>> >> >SpeechSynthesis Attributes
>> >> >
>> >> >pending attribute:
>> >> >This attribute is true if the queue for this SpeechSynthesis object
>> >> >contains any utterances which have not started speaking.
>> >> >
>> >> >speaking attribute:
>> >> >This attribute is true if an utterance is being spoken.
>> >Specifically if
>> >> >an utterance has begun being spoken and has not completed being
>> >spoken,
>> >> >and is independent of whether this SpeechSynthesis object is in the
>> >> >paused state.
>> >> >
>> >> >paused attribute:
>> >> >The attribute is true when this SpeechSynthesis object is in the
>> >paused
>> >> >state. This state is independent of whether anything is in the
>> >queue.
>> >> >The
>> >> >default state of a new SpeechSynthesis object is the non-paused
>> >state.
>> >> >
>> >> >
>> >> >SpeechSynthesis Methods
>> >> >
>> >> >The speak method
>> >> >This method appends the utterance to the end of the queue for this
>> >> >SpeechSynthesis object. It does not change the paused state of the
>> >> >SpeechSynthesis object.  If the SpeechSynthesis object is paused,
>> >it
>> >> >remains paused. If it is not paused, then this utterance is spoken
>> >if
>> >> >no other utterances are in the queue, else this utterance is queued
>> >to
>> >> >begin speaking after the other utterances in the queue have been
>> >> >spoken.
>> >> >
>> >> >The cancel method
>> >> >This method removes the specified utterance from the queue. If it
>> >is
>> >> >not in the queue, no changes are made. If the utterance removed is
>> >> >being spoken, speaking ceases for that utterance and the next
>> >utterance
>> >> >in the queue (if
>> >> >any) begins to be spoken. This method does not change the paused
>> >state
>> >> >of the SpeechSynthesis object.
>> >> >
>> >> >The pause method
>> >> >This method puts the SpeechSynthesis object into the paused state.
>> >If
>> >> >an utterance was being spoken, it pauses mid-utterance. (If called
>> >when
>> >> >the SpeechSynthesis object was already in the paused state, it does
>> >> >nothing.)
>> >> >
>> >> >The continue method
>> >> >This method puts the SpeechSynthesis object into the non-paused
>> >state.
>> >> >If
>> >> >an utterance was speaking (that is, its speaking attribute is
>> >true), it
>> >> >continues speaking the utterance at the point at which it was
>> >paused,
>> >> >else it begins speaking the next utterance in the queue (if any).
>> >(If
>> >> >called when the SpeechSynthesis object was already in the
>> >non-paused
>> >> >state, it does nothing.)
>> >> >
>> >> >The stop method.
>> >> >This method puts the SpeechSynthesis object into the paused state
>> >and
>> >> >flushes the queue. It sets the speaking attribute to false and the
>> >> >paused attribute to true.
>> >> >
>> >> >
>> >> >SpeechSynthesisUtterance attributes
>> >> >
>> >> >
>> >> >[[Note, I used SHOULD here because there may be some race-condition
>> >> >edge-cases where it might not be ignored.]]
>> >> >
>> >> >text attribute:
>> >> >The text to be synthesized for this utterance. Changes to this
>> >> >attribute after the utterance has been added to the queue (by
>> >calling
>> >> >the speak
>> >> >method) SHOULD be ignored.
>> >> >
>> >> >lang attribute:
>> >> >[no change except to append the following] Changes to this
>> >attribute
>> >> >after the utterance has been added to the queue (by calling the
>> >speak
>> >> >method)
>> >> >SHOULD be ignored.
>> >> >
>> >> >serviceURI attribute:
>> >> >[no change except to append the following] Changes to this
>> >attribute
>> >> >after the utterance has been added to the queue (by calling the
>> >speak
>> >> >method)
>> >> >SHOULD be ignored.
>> >> >
>> >> >speaking attribute:
>> >> >This attribute is true if this specific utterance is currently
>> >being
>> >> >spoken. Specifically if this utterance has begun being spoken and
>> >has
>> >> >not completed being spoken. This is independent of whether the
>> >> >SpeechSynthesis object is in a paused state.
>> >> >
>> >> >paused attribute:
>> >> >This attribute is true if this specific utterance has begun to be
>> >> >spoken, but has not completed and the SpeechSynthesis object is in
>> >the
>> >> >paused state.
>> >> >
>> >> >ended attribute:
>> >> >This attribute is true if this specific utterance has completed
>> >being
>> >> >spoken.
>> >> >
>> >> >SpeechSynthesisUtterance events
>> >> >
>> >> >onstart event:
>> >> >Fired when this utterance has begun to be spoken.
>> >> >
>> >> >onend event:
>> >> >Fired when this utterance has completed being spoken.
>> >> >
>> >> >
>> >> >
>> >> >On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
>> >> ><dmazzoni@google.com>wrote:
>> >> >
>> >> >> Thanks for proposing definitions.
>> >> >>
>> >> >> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com>
>> >> >wrote:
>> >> >> > I propose the following definitions for the SpeechSynthesis
>> >IDL:
>> >> >> >
>> >> >> > SpeechSynthesis Attributes
>> >> >> >
>> >> >> > pending attribute:
>> >> >> > This attribute is true if the queue contains any utterances
>> >which
>> >> >have
>> >> >> not
>> >> >> > completed playback.
>> >> >>
>> >> >> I was imagining: This attribute is true if the queue contains any
>> >> >> utterances which have not *started* speaking.
>> >> >>
>> >> >> > speaking attribute:
>> >> >> > This attribute is true if playback is in progress.
>> >> >>
>> >> >> I don't like the word "playback", it doesn't fit when the speech
>> >is
>> >> >> generated dynamically. How about: This attribute is true if an
>> >> >> utterance is being spoken.
>> >> >>
>> >> >> > paused attribute:
>> >> >> >   **** How is this different than (pending && !speaking) ? ****
>> >> >>
>> >> >> This is true if the speech synthesis system is in a paused state,
>> >> >> independent of whether anything is speaking or queued.
>> >> >>
>> >> >> paused && speaking -> it was paused in the middle of an utterance
>> >> >> paused && !speaking -> no utterance is speaking, but if you call
>> >> >> speak(), nothing will happen because it's in a paused state.
>> >> >>
>> >> >> >
>> >> >> > SpeechSynthesis Methods
>> >> >> >
>> >> >> > The speak method
>> >> >> > This method appends the utterance to the end of a playback
>> >queue.
>> >> >If
>> >> >> > playback is not in progress, it also begins playback of the
>> >next
>> >> >item in
>> >> >> the
>> >> >> > queue.
>> >> >>
>> >> >> What do you think about rewriting to not use "playback"?
>> >> >>
>> >> >> Also, my idea was that it would not begin playback if the system
>> >is
>> >> >in
>> >> >> a paused state.
>> >> >>
>> >> >> > The cancel method
>> >> >> > This method removes the first matching utterance (if any) from
>> >the
>> >> >> playback
>> >> >> > queue. If playback is in progress and the utterance removed is
>> >> >being
>> >> >> played,
>> >> >> > playback ceases for the utterance and the next utterance in the
>> >> >queue (if
>> >> >> > any) begins playing.
>> >> >>
>> >> >> Do we need to say "first matching"? Each utterance should be a
>> >> >> specific object, it should be either in the queue or not.
>> >> >>
>> >> >> > The pause method
>> >> >> > This method pauses the playback mid-utterance. If playback is
>> >not
>> >> >in
>> >> >> > progress, it does nothing.
>> >> >>
>> >> >> I was assuming that calling it would set the system into a paused
>> >> >> state, so that even a subsequent call to speak() would not do
>> >> >anything
>> >> >> other than enqueue.
>> >> >>
>> >> >> > The continue method
>> >> >> > This method continues the playback at the point in the
>> >utterance
>> >> >and
>> >> >> queue
>> >> >> > in which it was paused.  If playback is in progress, it does
>> >> >nothing.
>> >> >> >
>> >> >> > The stop method.
>> >> >> > This method stops playback mid-utterance and flushes the queue.
>> >> >> >
>> >> >> >
>> >> >> > SpeechSynthesisUtterance attributes
>> >> >> >
>> >> >> > text attribute:
>> >> >> > The text to be synthesized for this utterance. This attribute
>> >must
>> >> >not be
>> >> >> > changed after onstart fires.
>> >> >>
>> >> >> I'd say: changes to this attribute after the utterance has been
>> >> >added
>> >> >> to the queue (by calling "speak") will be ignored. OR, we should
>> >> >make
>> >> >> it a DOM exception to modify it when it's in the speech queue.
>> >> >>
>> >> >> > paused attribute:
>> >> >> > This attribute is true if this specific utterance is in the
>> >queue
>> >> >and has
>> >> >> > not completed playback.
>> >> >>
>> >> >> I think this should only be true if it has begin speaking but not
>> >> >> completed.
>> >> >>
>> >> >> - Dominic
>> >> >>
>> >>
>> >> --
>> >> NOTICE TO RECIPIENT:
>> >> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
>> >TRANSMISSION,
>> >> AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS
>> >E-MAIL
>> >> IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING
>> >OF THIS
>> >> E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE
>> >ERROR
>> >> BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>> >THANK YOU
>> >> IN ADVANCE FOR YOUR COOPERATION.
>> >> Reply to : legal@openstream.com
>> >>
>> >>
>> >>
>> >>
>>
>> --
>> NOTICE TO RECIPIENT:
>> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
>> TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU
>> RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION,
>> DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE
>> NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS
>> MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION.
>> Reply to : legal@openstream.com
>>
>>
>
Received on Tuesday, 18 September 2012 05:26:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:02:28 UTC