Re: TTS proposal to split Utterance into its own interface from Dominic Mazzoni on 2012-09-18 (public-speech-api@w3.org from September 2012)

From: Dominic Mazzoni <dmazzoni@google.com>
Date: Tue, 18 Sep 2012 00:05:51 -0700
To: Glen Shires <gshires@google.com>
Cc: Nagesh Kharidi <nagesh@openstream.com>, Jim Barnett <Jim.Barnett@genesyslab.com>, Hans Wennborg <hwennborg@google.com>, olli@pettay.fi, public-speech-api@w3.org
Message-ID: <CAFz-FYziFcGX5DBpkHre4Wa+ojCHjRjpcEzggQsj+2Z9fM==Tg@mail.gmail.com>
Looking good. Just one suggestion: how about replacing "this
SpeechSynthesis object" with "the global SpeechSynthesis instance" or
something that indicates there's just a single global SpeechSynthesis.

- Dominic


On Mon, Sep 17, 2012 at 10:25 PM, Glen Shires <gshires@google.com> wrote:
> I've updated the spec with the above SpeechSynthesis and
> SpeechSynthesisUtterance IDL and definitions:
> https://dvcs.w3.org/hg/speech-api/rev/b036c78e9445
>
> As always, the current draft spec is at:
> http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>
> On Sat, Sep 15, 2012 at 1:31 PM, Glen Shires <gshires@google.com> wrote:
>>
>> Nagesh,
>> I agree that cancelAll() is useful and can make code simpler because it
>> doesn't affect the paused state.  In fact, I propose that we add cancelAll()
>> and remove stop() -- because the stop function is probably less common and
>> can easily be accomplished with two calls: cancelAll() and pause().
>>
>> Also, since canceling a specific utterance is not very useful, and
>> questionable as Jerry states, I propose eliminating cancel(utterance). If we
>> do that, then we could rename cancelAll() more simply as cancel().
>>
>> Thus, I propose this IDL:
>>
>>     interface SpeechSynthesis {
>>       static readonly attribute boolean pending;
>>       static readonly attribute boolean speaking;
>>       static readonly attribute boolean paused;
>>
>>       static void speak(SpeechSynthesisUtterance utterance);
>>       static void cancel();
>>       static void pause();
>>       static void continue();
>>     }
>>
>> and I propose this new definition of cancel:
>>
>> The cancel method
>> This method removes all utterances from the queue. If an utterance is
>> being spoken, speaking ceases immediately. This method does not change the
>> paused state of the SpeechSynthesis object.
>>
>> /Glen Shires
>>
>>
>> On Sat, Sep 15, 2012 at 3:25 AM, Nagesh Kharidi <nagesh@openstream.com>
>> wrote:
>>>
>>> Please see inline.
>>>
>>> Regards,
>>> Nagesh
>>>
>>> On Fri, 14 Sep 2012 12:59:41 -0700
>>>  Glen Shires <gshires@google.com> wrote:
>>> >> Provide the ability to cancel all currently queued utterances.
>>> >
>>> >The stop() method cancels all queued utterances. (Dominic proposed
>>> >that
>>> >this message be named stopAndFlushQueue(), would that name be more
>>> >clear?)
>>>
>>> In addition to canceling all queued utterances, the stop() method also
>>> pauses the SpeechSynthesis object. A separate cancelAll() method would
>>> be useful, without which, if a new utterance is to be spoken
>>> immediately, we would have to do :
>>> speechSynthesis.stop();
>>> speechSynthesis.continue();
>>> speechSynthesis.speak(utterance);
>>>
>>> With a cancelAll() method, this would be:
>>> speechSynthesis.cancelAll();
>>> speechSynthesis.speak(utterance);
>>>
>>> Since this would be such a common usage, we could make it even easier
>>> for developers by either:
>>> - providing a speakImmediate(utterance) method that cancels all queued
>>> utterances and then starts speaking the new utterance
>>> or
>>> - adding a second parameter as follows to the speak() method:
>>> speechSynthesis.speak(utterance, speakImmediately);
>>> If speakImmediately is true, all currently queued utterances will be
>>> canceled and the new utterance will be spoken.
>>>
>>> >
>>> >Also, what is the use case for the current cancel(utterance) method?
>>> > In
>>> >all the use cases I envision, you'd want to cancel all queued
>>> >utterances.
>>> >Can we eliminate cancel() ?
>>>
>>> I also agree that canceling a specific utterance is not very useful.
>>> Canceling all queued utterances would be more common than canceling a
>>> specific utterance.
>>>
>>> >
>>> >
>>> >> New speakNext SpeechSynthesis method - append the utterance to the
>>> >beginning of the queue
>>> >
>>> >I'd like more discussion on this. What are the use cases? What are the
>>> >edge
>>> >cases (e.g. If there's a race-condition, the current utterance may
>>> >finish
>>> >and the second in the queue may begin speaking before this new
>>> >utterance is
>>> >inserted).
>>>
>>> Use case for speakNext() method: Consider a news application that plays
>>> the latest news items. It queues all news items to be played. Now if
>>> there is a new "breaking news" item that comes in, the speakNext()
>>> method can be used to play it as soon as possible without canceling the
>>> already queued items.
>>>
>>>
>>> >
>>> >
>>> >>  Question:  Can a cancelled utterance be re-queued?
>>> >
>>> >Good question, and also, what is the lifetime of a
>>> >SpeechSynthesisUtterance
>>> >object and who owns it. There's at least 3 possibilities:
>>> >
>>> >1. The speak() method takes ownership when it adds it to the queue,
>>> >then it
>>> >would presumably be destroyed upon cancel or onend.
>>> >    (This raises the questions: what usefulness is
>>> >the SpeechSynthesisUtterance object attribute "ended", since the
>>> >object
>>> >will be destroyed when it turns true. It also makes it messy to use
>>> >the
>>> >other readonly attributes because the object may be deleted suddenly.
>>> >Also, what if the author deletes the SpeechSynthesisUtterance object
>>> >prior
>>> >to it being spoken.  One easy way to accidentally create this bug is
>>> >to
>>> >define the SpeechSynthesisUtterance object in a method that goes out
>>> >of
>>> >scope.)
>>> >
>>> >2. The speak() method does not take ownership when it adds it directly
>>> >to
>>> >queue.
>>> >    (This raises the question: what if the author deletes the
>>> >SpeechSynthesisUtterance object prior to it being spoken.  One easy
>>> >way to
>>> >accidentally create this bug is to define the SpeechSynthesisUtterance
>>> >object in a method that goes out of scope.)
>>> >
>>> >3. The speak() method does not take ownership, it makes a copy of it
>>> >when
>>> >it adds it to queue .
>>> >    (This raises the question: how can the author's
>>> >original SpeechSynthesisUtterance object readonly attributes
>>> >(speaking,
>>> >paused, ended) reflect the state of the copy on the queue.)
>>> >
>>> >
>>> >To resolve these issues, I propose the following, because I think it's
>>> >the
>>> >cleanest solution and easiest for authors, since they can create and
>>> >destroy objects, and go out of scope, without worrying about the
>>> >speaking
>>> >queue timing:
>>> >
>>> >The speak() method does not take ownership of the
>>> >SpeechSynthesisUtterance
>>> >object, it makes a copy of it when it adds it to queue.  We eliminate
>>> >the SpeechSynthesisUtterance readonly attributes, relying instead on
>>> >events
>>> >that indicate change in state, including new events for: onpause,
>>> >onresume.
>>> >
>>> >Because it's a copy of the object, this clarifies that:
>>> >- changes to the original SpeechSynthesisUtterance object after
>>> >calling
>>> >speak() do not affect the copy on the queue.
>>> >- the same SpeechSynthesisUtterance object can be used to call speak()
>>> >multiple times, (even after a copy of which was spoken or cancelled).
>>> >
>>> >The new IDL would be:
>>> >
>>> >    interface SpeechSynthesisUtterance {
>>> >      attribute DOMString text;
>>> >      attribute DOMString lang;
>>> >      attribute DOMString serviceURI;
>>> >
>>> >      attribute Function onstart;
>>> >      attribute Function onend;
>>> >*      attribute Function onpause;*
>>> >*      attribute Function onresume;*
>>> >    }
>>> >
>>> >
>>> >And the new definition:
>>> >
>>> >The speak method
>>> >This method appends *a copy of* the utterance to the end of the queue
>>> >for
>>> >this SpeechSynthesis object. It does not change the paused state of
>>> >the
>>> >SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
>>> >remains paused. If it is not paused, then this utterance is spoken if
>>> >no
>>> >other utterances are in the queue, else this utterance is queued to
>>> >begin
>>> >speaking after the other utterances in the queue have been spoken.
>>> >
>>> >
>>> >/Glen Shires
>>> >
>>> >
>>> >On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
>>> ><Jim.Barnett@genesyslab.com>wrote:
>>> >
>>> >> I would  think that cancelling all utterances would be the more
>>> >common use
>>> >> case (so we ought to make it easy).  Question:  Can a cancelled
>>> >utterance
>>> >> be re-queued?
>>> >>
>>> >> - Jim
>>> >>
>>> >> -----Original Message-----
>>> >> From: Nagesh Kharidi [mailto:nagesh@openstream.com]
>>> >> Sent: Friday, September 14, 2012 8:58 AM
>>> >> To: Glen Shires; Dominic Mazzoni
>>> >> Cc: Hans Wennborg; olli@pettay.fi; public-speech-api@w3.org
>>> >> Subject: Re: TTS proposal to split Utterance into its own interface
>>> >>
>>> >> I would like to propose the following:
>>> >> 1. Provide the ability to cancel all currently queued utterances. A
>>> >new
>>> >> cancelAll method could be added. Alternately, invoking the cancel
>>> >method
>>> >> without the utterance parameter could imply cancel all utterances.
>>> >>
>>> >> 2. New speakNext SpeechSynthesis method
>>> >> This method will append the utterance to the beginning of the queue.
>>> >>
>>> >> 3. New oncancel SpeechSynthesisUtterance event Fired when the
>>> >utterance is
>>> >> canceled.
>>> >>
>>> >> 4. New canceled SpeechSynthesisUtterance attribute true if the
>>> >utterance
>>> >> is canceled.
>>> >>
>>> >>
>>> >> I also had a question regarding the stop method: Is "flushes the
>>> >queue"
>>> >> equivalent to calling cancel on all utterances in the queue? If so,
>>> >I
>>> >> would like to suggest changing "flushes the queue" to "cancels all
>>> >> utterances in the queue".
>>> >>
>>> >> Regards,
>>> >> Nagesh
>>> >>
>>> >> On Thu, 13 Sep 2012 14:13:56 -0700
>>> >>  Glen Shires <gshires@google.com> wrote:
>>> >> >Yes, I like the way you've defined the "speak" method to not change
>>> >the
>>> >> >play/pause state. Also, I didn't particularly like the word
>>> >"playback",
>>> >> >so thanks for the alternative "spoken".  Here's updated definitions
>>> >> >with your suggestions incorporated. If there's no disagreement,
>>> >I'll
>>> >> >add them to the spec on Monday.
>>> >> >
>>> >> >
>>> >> >SpeechSynthesis Attributes
>>> >> >
>>> >> >pending attribute:
>>> >> >This attribute is true if the queue for this SpeechSynthesis object
>>> >> >contains any utterances which have not started speaking.
>>> >> >
>>> >> >speaking attribute:
>>> >> >This attribute is true if an utterance is being spoken.
>>> >Specifically if
>>> >> >an utterance has begun being spoken and has not completed being
>>> >spoken,
>>> >> >and is independent of whether this SpeechSynthesis object is in the
>>> >> >paused state.
>>> >> >
>>> >> >paused attribute:
>>> >> >The attribute is true when this SpeechSynthesis object is in the
>>> >paused
>>> >> >state. This state is independent of whether anything is in the
>>> >queue.
>>> >> >The
>>> >> >default state of a new SpeechSynthesis object is the non-paused
>>> >state.
>>> >> >
>>> >> >
>>> >> >SpeechSynthesis Methods
>>> >> >
>>> >> >The speak method
>>> >> >This method appends the utterance to the end of the queue for this
>>> >> >SpeechSynthesis object. It does not change the paused state of the
>>> >> >SpeechSynthesis object.  If the SpeechSynthesis object is paused,
>>> >it
>>> >> >remains paused. If it is not paused, then this utterance is spoken
>>> >if
>>> >> >no other utterances are in the queue, else this utterance is queued
>>> >to
>>> >> >begin speaking after the other utterances in the queue have been
>>> >> >spoken.
>>> >> >
>>> >> >The cancel method
>>> >> >This method removes the specified utterance from the queue. If it
>>> >is
>>> >> >not in the queue, no changes are made. If the utterance removed is
>>> >> >being spoken, speaking ceases for that utterance and the next
>>> >utterance
>>> >> >in the queue (if
>>> >> >any) begins to be spoken. This method does not change the paused
>>> >state
>>> >> >of the SpeechSynthesis object.
>>> >> >
>>> >> >The pause method
>>> >> >This method puts the SpeechSynthesis object into the paused state.
>>> >If
>>> >> >an utterance was being spoken, it pauses mid-utterance. (If called
>>> >when
>>> >> >the SpeechSynthesis object was already in the paused state, it does
>>> >> >nothing.)
>>> >> >
>>> >> >The continue method
>>> >> >This method puts the SpeechSynthesis object into the non-paused
>>> >state.
>>> >> >If
>>> >> >an utterance was speaking (that is, its speaking attribute is
>>> >true), it
>>> >> >continues speaking the utterance at the point at which it was
>>> >paused,
>>> >> >else it begins speaking the next utterance in the queue (if any).
>>> >(If
>>> >> >called when the SpeechSynthesis object was already in the
>>> >non-paused
>>> >> >state, it does nothing.)
>>> >> >
>>> >> >The stop method.
>>> >> >This method puts the SpeechSynthesis object into the paused state
>>> >and
>>> >> >flushes the queue. It sets the speaking attribute to false and the
>>> >> >paused attribute to true.
>>> >> >
>>> >> >
>>> >> >SpeechSynthesisUtterance attributes
>>> >> >
>>> >> >
>>> >> >[[Note, I used SHOULD here because there may be some race-condition
>>> >> >edge-cases where it might not be ignored.]]
>>> >> >
>>> >> >text attribute:
>>> >> >The text to be synthesized for this utterance. Changes to this
>>> >> >attribute after the utterance has been added to the queue (by
>>> >calling
>>> >> >the speak
>>> >> >method) SHOULD be ignored.
>>> >> >
>>> >> >lang attribute:
>>> >> >[no change except to append the following] Changes to this
>>> >attribute
>>> >> >after the utterance has been added to the queue (by calling the
>>> >speak
>>> >> >method)
>>> >> >SHOULD be ignored.
>>> >> >
>>> >> >serviceURI attribute:
>>> >> >[no change except to append the following] Changes to this
>>> >attribute
>>> >> >after the utterance has been added to the queue (by calling the
>>> >speak
>>> >> >method)
>>> >> >SHOULD be ignored.
>>> >> >
>>> >> >speaking attribute:
>>> >> >This attribute is true if this specific utterance is currently
>>> >being
>>> >> >spoken. Specifically if this utterance has begun being spoken and
>>> >has
>>> >> >not completed being spoken. This is independent of whether the
>>> >> >SpeechSynthesis object is in a paused state.
>>> >> >
>>> >> >paused attribute:
>>> >> >This attribute is true if this specific utterance has begun to be
>>> >> >spoken, but has not completed and the SpeechSynthesis object is in
>>> >the
>>> >> >paused state.
>>> >> >
>>> >> >ended attribute:
>>> >> >This attribute is true if this specific utterance has completed
>>> >being
>>> >> >spoken.
>>> >> >
>>> >> >SpeechSynthesisUtterance events
>>> >> >
>>> >> >onstart event:
>>> >> >Fired when this utterance has begun to be spoken.
>>> >> >
>>> >> >onend event:
>>> >> >Fired when this utterance has completed being spoken.
>>> >> >
>>> >> >
>>> >> >
>>> >> >On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
>>> >> ><dmazzoni@google.com>wrote:
>>> >> >
>>> >> >> Thanks for proposing definitions.
>>> >> >>
>>> >> >> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com>
>>> >> >wrote:
>>> >> >> > I propose the following definitions for the SpeechSynthesis
>>> >IDL:
>>> >> >> >
>>> >> >> > SpeechSynthesis Attributes
>>> >> >> >
>>> >> >> > pending attribute:
>>> >> >> > This attribute is true if the queue contains any utterances
>>> >which
>>> >> >have
>>> >> >> not
>>> >> >> > completed playback.
>>> >> >>
>>> >> >> I was imagining: This attribute is true if the queue contains any
>>> >> >> utterances which have not *started* speaking.
>>> >> >>
>>> >> >> > speaking attribute:
>>> >> >> > This attribute is true if playback is in progress.
>>> >> >>
>>> >> >> I don't like the word "playback", it doesn't fit when the speech
>>> >is
>>> >> >> generated dynamically. How about: This attribute is true if an
>>> >> >> utterance is being spoken.
>>> >> >>
>>> >> >> > paused attribute:
>>> >> >> >   **** How is this different than (pending && !speaking) ? ****
>>> >> >>
>>> >> >> This is true if the speech synthesis system is in a paused state,
>>> >> >> independent of whether anything is speaking or queued.
>>> >> >>
>>> >> >> paused && speaking -> it was paused in the middle of an utterance
>>> >> >> paused && !speaking -> no utterance is speaking, but if you call
>>> >> >> speak(), nothing will happen because it's in a paused state.
>>> >> >>
>>> >> >> >
>>> >> >> > SpeechSynthesis Methods
>>> >> >> >
>>> >> >> > The speak method
>>> >> >> > This method appends the utterance to the end of a playback
>>> >queue.
>>> >> >If
>>> >> >> > playback is not in progress, it also begins playback of the
>>> >next
>>> >> >item in
>>> >> >> the
>>> >> >> > queue.
>>> >> >>
>>> >> >> What do you think about rewriting to not use "playback"?
>>> >> >>
>>> >> >> Also, my idea was that it would not begin playback if the system
>>> >is
>>> >> >in
>>> >> >> a paused state.
>>> >> >>
>>> >> >> > The cancel method
>>> >> >> > This method removes the first matching utterance (if any) from
>>> >the
>>> >> >> playback
>>> >> >> > queue. If playback is in progress and the utterance removed is
>>> >> >being
>>> >> >> played,
>>> >> >> > playback ceases for the utterance and the next utterance in the
>>> >> >queue (if
>>> >> >> > any) begins playing.
>>> >> >>
>>> >> >> Do we need to say "first matching"? Each utterance should be a
>>> >> >> specific object, it should be either in the queue or not.
>>> >> >>
>>> >> >> > The pause method
>>> >> >> > This method pauses the playback mid-utterance. If playback is
>>> >not
>>> >> >in
>>> >> >> > progress, it does nothing.
>>> >> >>
>>> >> >> I was assuming that calling it would set the system into a paused
>>> >> >> state, so that even a subsequent call to speak() would not do
>>> >> >anything
>>> >> >> other than enqueue.
>>> >> >>
>>> >> >> > The continue method
>>> >> >> > This method continues the playback at the point in the
>>> >utterance
>>> >> >and
>>> >> >> queue
>>> >> >> > in which it was paused.  If playback is in progress, it does
>>> >> >nothing.
>>> >> >> >
>>> >> >> > The stop method.
>>> >> >> > This method stops playback mid-utterance and flushes the queue.
>>> >> >> >
>>> >> >> >
>>> >> >> > SpeechSynthesisUtterance attributes
>>> >> >> >
>>> >> >> > text attribute:
>>> >> >> > The text to be synthesized for this utterance. This attribute
>>> >must
>>> >> >not be
>>> >> >> > changed after onstart fires.
>>> >> >>
>>> >> >> I'd say: changes to this attribute after the utterance has been
>>> >> >added
>>> >> >> to the queue (by calling "speak") will be ignored. OR, we should
>>> >> >make
>>> >> >> it a DOM exception to modify it when it's in the speech queue.
>>> >> >>
>>> >> >> > paused attribute:
>>> >> >> > This attribute is true if this specific utterance is in the
>>> >queue
>>> >> >and has
>>> >> >> > not completed playback.
>>> >> >>
>>> >> >> I think this should only be true if it has begin speaking but not
>>> >> >> completed.
>>> >> >>
>>> >> >> - Dominic
>>> >> >>
>>> >>
>>> >> --
>>> >> NOTICE TO RECIPIENT:
>>> >> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
>>> >TRANSMISSION,
>>> >> AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS
>>> >E-MAIL
>>> >> IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING
>>> >OF THIS
>>> >> E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE
>>> >ERROR
>>> >> BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>>> >THANK YOU
>>> >> IN ADVANCE FOR YOUR COOPERATION.
>>> >> Reply to : legal@openstream.com
>>> >>
>>> >>
>>> >>
>>> >>
>>>
>>> --
>>> NOTICE TO RECIPIENT:
>>> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
>>> TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED
>>> THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR
>>> COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY
>>> OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR
>>> SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION.
>>> Reply to : legal@openstream.com
>>>
>>
>
Received on Tuesday, 18 September 2012 07:06:20 UTC