W3C home > Mailing lists > Public > public-speech-api@w3.org > September 2012

Re: TTS proposal to split Utterance into its own interface

From: Glen Shires <gshires@google.com>
Date: Tue, 18 Sep 2012 12:47:02 -0700
Message-ID: <CAEE5bcguuY3-oRDtWC=WW9sH8DsW=np2fmn=fV3=OnCwKWugNQ@mail.gmail.com>
To: Dominic Mazzoni <dmazzoni@google.com>
Cc: Nagesh Kharidi <nagesh@openstream.com>, Jim Barnett <Jim.Barnett@genesyslab.com>, Hans Wennborg <hwennborg@google.com>, olli@pettay.fi, public-speech-api@w3.org
I've updated the spec with the above clarification: references to
"SpeechSynthesis object" are now "global SpeechSynthesis instance".
https://dvcs.w3.org/hg/speech-api/rev/bf779b363c93

As always, the current draft spec is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

/Glen Shires

On Tue, Sep 18, 2012 at 12:05 AM, Dominic Mazzoni <dmazzoni@google.com>wrote:

> Looking good. Just one suggestion: how about replacing "this
> SpeechSynthesis object" with "the global SpeechSynthesis instance" or
> something that indicates there's just a single global SpeechSynthesis.
>
> - Dominic
>
>
> On Mon, Sep 17, 2012 at 10:25 PM, Glen Shires <gshires@google.com> wrote:
> > I've updated the spec with the above SpeechSynthesis and
> > SpeechSynthesisUtterance IDL and definitions:
> > https://dvcs.w3.org/hg/speech-api/rev/b036c78e9445
> >
> > As always, the current draft spec is at:
> > http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
> >
> > On Sat, Sep 15, 2012 at 1:31 PM, Glen Shires <gshires@google.com> wrote:
> >>
> >> Nagesh,
> >> I agree that cancelAll() is useful and can make code simpler because it
> >> doesn't affect the paused state.  In fact, I propose that we add
> cancelAll()
> >> and remove stop() -- because the stop function is probably less common
> and
> >> can easily be accomplished with two calls: cancelAll() and pause().
> >>
> >> Also, since canceling a specific utterance is not very useful, and
> >> questionable as Jerry states, I propose eliminating cancel(utterance).
> If we
> >> do that, then we could rename cancelAll() more simply as cancel().
> >>
> >> Thus, I propose this IDL:
> >>
> >>     interface SpeechSynthesis {
> >>       static readonly attribute boolean pending;
> >>       static readonly attribute boolean speaking;
> >>       static readonly attribute boolean paused;
> >>
> >>       static void speak(SpeechSynthesisUtterance utterance);
> >>       static void cancel();
> >>       static void pause();
> >>       static void continue();
> >>     }
> >>
> >> and I propose this new definition of cancel:
> >>
> >> The cancel method
> >> This method removes all utterances from the queue. If an utterance is
> >> being spoken, speaking ceases immediately. This method does not change
> the
> >> paused state of the SpeechSynthesis object.
> >>
> >> /Glen Shires
> >>
> >>
> >> On Sat, Sep 15, 2012 at 3:25 AM, Nagesh Kharidi <nagesh@openstream.com>
> >> wrote:
> >>>
> >>> Please see inline.
> >>>
> >>> Regards,
> >>> Nagesh
> >>>
> >>> On Fri, 14 Sep 2012 12:59:41 -0700
> >>>  Glen Shires <gshires@google.com> wrote:
> >>> >> Provide the ability to cancel all currently queued utterances.
> >>> >
> >>> >The stop() method cancels all queued utterances. (Dominic proposed
> >>> >that
> >>> >this message be named stopAndFlushQueue(), would that name be more
> >>> >clear?)
> >>>
> >>> In addition to canceling all queued utterances, the stop() method also
> >>> pauses the SpeechSynthesis object. A separate cancelAll() method would
> >>> be useful, without which, if a new utterance is to be spoken
> >>> immediately, we would have to do :
> >>> speechSynthesis.stop();
> >>> speechSynthesis.continue();
> >>> speechSynthesis.speak(utterance);
> >>>
> >>> With a cancelAll() method, this would be:
> >>> speechSynthesis.cancelAll();
> >>> speechSynthesis.speak(utterance);
> >>>
> >>> Since this would be such a common usage, we could make it even easier
> >>> for developers by either:
> >>> - providing a speakImmediate(utterance) method that cancels all queued
> >>> utterances and then starts speaking the new utterance
> >>> or
> >>> - adding a second parameter as follows to the speak() method:
> >>> speechSynthesis.speak(utterance, speakImmediately);
> >>> If speakImmediately is true, all currently queued utterances will be
> >>> canceled and the new utterance will be spoken.
> >>>
> >>> >
> >>> >Also, what is the use case for the current cancel(utterance) method?
> >>> > In
> >>> >all the use cases I envision, you'd want to cancel all queued
> >>> >utterances.
> >>> >Can we eliminate cancel() ?
> >>>
> >>> I also agree that canceling a specific utterance is not very useful.
> >>> Canceling all queued utterances would be more common than canceling a
> >>> specific utterance.
> >>>
> >>> >
> >>> >
> >>> >> New speakNext SpeechSynthesis method - append the utterance to the
> >>> >beginning of the queue
> >>> >
> >>> >I'd like more discussion on this. What are the use cases? What are the
> >>> >edge
> >>> >cases (e.g. If there's a race-condition, the current utterance may
> >>> >finish
> >>> >and the second in the queue may begin speaking before this new
> >>> >utterance is
> >>> >inserted).
> >>>
> >>> Use case for speakNext() method: Consider a news application that plays
> >>> the latest news items. It queues all news items to be played. Now if
> >>> there is a new "breaking news" item that comes in, the speakNext()
> >>> method can be used to play it as soon as possible without canceling the
> >>> already queued items.
> >>>
> >>>
> >>> >
> >>> >
> >>> >>  Question:  Can a cancelled utterance be re-queued?
> >>> >
> >>> >Good question, and also, what is the lifetime of a
> >>> >SpeechSynthesisUtterance
> >>> >object and who owns it. There's at least 3 possibilities:
> >>> >
> >>> >1. The speak() method takes ownership when it adds it to the queue,
> >>> >then it
> >>> >would presumably be destroyed upon cancel or onend.
> >>> >    (This raises the questions: what usefulness is
> >>> >the SpeechSynthesisUtterance object attribute "ended", since the
> >>> >object
> >>> >will be destroyed when it turns true. It also makes it messy to use
> >>> >the
> >>> >other readonly attributes because the object may be deleted suddenly.
> >>> >Also, what if the author deletes the SpeechSynthesisUtterance object
> >>> >prior
> >>> >to it being spoken.  One easy way to accidentally create this bug is
> >>> >to
> >>> >define the SpeechSynthesisUtterance object in a method that goes out
> >>> >of
> >>> >scope.)
> >>> >
> >>> >2. The speak() method does not take ownership when it adds it directly
> >>> >to
> >>> >queue.
> >>> >    (This raises the question: what if the author deletes the
> >>> >SpeechSynthesisUtterance object prior to it being spoken.  One easy
> >>> >way to
> >>> >accidentally create this bug is to define the SpeechSynthesisUtterance
> >>> >object in a method that goes out of scope.)
> >>> >
> >>> >3. The speak() method does not take ownership, it makes a copy of it
> >>> >when
> >>> >it adds it to queue .
> >>> >    (This raises the question: how can the author's
> >>> >original SpeechSynthesisUtterance object readonly attributes
> >>> >(speaking,
> >>> >paused, ended) reflect the state of the copy on the queue.)
> >>> >
> >>> >
> >>> >To resolve these issues, I propose the following, because I think it's
> >>> >the
> >>> >cleanest solution and easiest for authors, since they can create and
> >>> >destroy objects, and go out of scope, without worrying about the
> >>> >speaking
> >>> >queue timing:
> >>> >
> >>> >The speak() method does not take ownership of the
> >>> >SpeechSynthesisUtterance
> >>> >object, it makes a copy of it when it adds it to queue.  We eliminate
> >>> >the SpeechSynthesisUtterance readonly attributes, relying instead on
> >>> >events
> >>> >that indicate change in state, including new events for: onpause,
> >>> >onresume.
> >>> >
> >>> >Because it's a copy of the object, this clarifies that:
> >>> >- changes to the original SpeechSynthesisUtterance object after
> >>> >calling
> >>> >speak() do not affect the copy on the queue.
> >>> >- the same SpeechSynthesisUtterance object can be used to call speak()
> >>> >multiple times, (even after a copy of which was spoken or cancelled).
> >>> >
> >>> >The new IDL would be:
> >>> >
> >>> >    interface SpeechSynthesisUtterance {
> >>> >      attribute DOMString text;
> >>> >      attribute DOMString lang;
> >>> >      attribute DOMString serviceURI;
> >>> >
> >>> >      attribute Function onstart;
> >>> >      attribute Function onend;
> >>> >*      attribute Function onpause;*
> >>> >*      attribute Function onresume;*
> >>> >    }
> >>> >
> >>> >
> >>> >And the new definition:
> >>> >
> >>> >The speak method
> >>> >This method appends *a copy of* the utterance to the end of the queue
> >>> >for
> >>> >this SpeechSynthesis object. It does not change the paused state of
> >>> >the
> >>> >SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
> >>> >remains paused. If it is not paused, then this utterance is spoken if
> >>> >no
> >>> >other utterances are in the queue, else this utterance is queued to
> >>> >begin
> >>> >speaking after the other utterances in the queue have been spoken.
> >>> >
> >>> >
> >>> >/Glen Shires
> >>> >
> >>> >
> >>> >On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
> >>> ><Jim.Barnett@genesyslab.com>wrote:
> >>> >
> >>> >> I would  think that cancelling all utterances would be the more
> >>> >common use
> >>> >> case (so we ought to make it easy).  Question:  Can a cancelled
> >>> >utterance
> >>> >> be re-queued?
> >>> >>
> >>> >> - Jim
> >>> >>
> >>> >> -----Original Message-----
> >>> >> From: Nagesh Kharidi [mailto:nagesh@openstream.com]
> >>> >> Sent: Friday, September 14, 2012 8:58 AM
> >>> >> To: Glen Shires; Dominic Mazzoni
> >>> >> Cc: Hans Wennborg; olli@pettay.fi; public-speech-api@w3.org
> >>> >> Subject: Re: TTS proposal to split Utterance into its own interface
> >>> >>
> >>> >> I would like to propose the following:
> >>> >> 1. Provide the ability to cancel all currently queued utterances. A
> >>> >new
> >>> >> cancelAll method could be added. Alternately, invoking the cancel
> >>> >method
> >>> >> without the utterance parameter could imply cancel all utterances.
> >>> >>
> >>> >> 2. New speakNext SpeechSynthesis method
> >>> >> This method will append the utterance to the beginning of the queue.
> >>> >>
> >>> >> 3. New oncancel SpeechSynthesisUtterance event Fired when the
> >>> >utterance is
> >>> >> canceled.
> >>> >>
> >>> >> 4. New canceled SpeechSynthesisUtterance attribute true if the
> >>> >utterance
> >>> >> is canceled.
> >>> >>
> >>> >>
> >>> >> I also had a question regarding the stop method: Is "flushes the
> >>> >queue"
> >>> >> equivalent to calling cancel on all utterances in the queue? If so,
> >>> >I
> >>> >> would like to suggest changing "flushes the queue" to "cancels all
> >>> >> utterances in the queue".
> >>> >>
> >>> >> Regards,
> >>> >> Nagesh
> >>> >>
> >>> >> On Thu, 13 Sep 2012 14:13:56 -0700
> >>> >>  Glen Shires <gshires@google.com> wrote:
> >>> >> >Yes, I like the way you've defined the "speak" method to not change
> >>> >the
> >>> >> >play/pause state. Also, I didn't particularly like the word
> >>> >"playback",
> >>> >> >so thanks for the alternative "spoken".  Here's updated definitions
> >>> >> >with your suggestions incorporated. If there's no disagreement,
> >>> >I'll
> >>> >> >add them to the spec on Monday.
> >>> >> >
> >>> >> >
> >>> >> >SpeechSynthesis Attributes
> >>> >> >
> >>> >> >pending attribute:
> >>> >> >This attribute is true if the queue for this SpeechSynthesis object
> >>> >> >contains any utterances which have not started speaking.
> >>> >> >
> >>> >> >speaking attribute:
> >>> >> >This attribute is true if an utterance is being spoken.
> >>> >Specifically if
> >>> >> >an utterance has begun being spoken and has not completed being
> >>> >spoken,
> >>> >> >and is independent of whether this SpeechSynthesis object is in the
> >>> >> >paused state.
> >>> >> >
> >>> >> >paused attribute:
> >>> >> >The attribute is true when this SpeechSynthesis object is in the
> >>> >paused
> >>> >> >state. This state is independent of whether anything is in the
> >>> >queue.
> >>> >> >The
> >>> >> >default state of a new SpeechSynthesis object is the non-paused
> >>> >state.
> >>> >> >
> >>> >> >
> >>> >> >SpeechSynthesis Methods
> >>> >> >
> >>> >> >The speak method
> >>> >> >This method appends the utterance to the end of the queue for this
> >>> >> >SpeechSynthesis object. It does not change the paused state of the
> >>> >> >SpeechSynthesis object.  If the SpeechSynthesis object is paused,
> >>> >it
> >>> >> >remains paused. If it is not paused, then this utterance is spoken
> >>> >if
> >>> >> >no other utterances are in the queue, else this utterance is queued
> >>> >to
> >>> >> >begin speaking after the other utterances in the queue have been
> >>> >> >spoken.
> >>> >> >
> >>> >> >The cancel method
> >>> >> >This method removes the specified utterance from the queue. If it
> >>> >is
> >>> >> >not in the queue, no changes are made. If the utterance removed is
> >>> >> >being spoken, speaking ceases for that utterance and the next
> >>> >utterance
> >>> >> >in the queue (if
> >>> >> >any) begins to be spoken. This method does not change the paused
> >>> >state
> >>> >> >of the SpeechSynthesis object.
> >>> >> >
> >>> >> >The pause method
> >>> >> >This method puts the SpeechSynthesis object into the paused state.
> >>> >If
> >>> >> >an utterance was being spoken, it pauses mid-utterance. (If called
> >>> >when
> >>> >> >the SpeechSynthesis object was already in the paused state, it does
> >>> >> >nothing.)
> >>> >> >
> >>> >> >The continue method
> >>> >> >This method puts the SpeechSynthesis object into the non-paused
> >>> >state.
> >>> >> >If
> >>> >> >an utterance was speaking (that is, its speaking attribute is
> >>> >true), it
> >>> >> >continues speaking the utterance at the point at which it was
> >>> >paused,
> >>> >> >else it begins speaking the next utterance in the queue (if any).
> >>> >(If
> >>> >> >called when the SpeechSynthesis object was already in the
> >>> >non-paused
> >>> >> >state, it does nothing.)
> >>> >> >
> >>> >> >The stop method.
> >>> >> >This method puts the SpeechSynthesis object into the paused state
> >>> >and
> >>> >> >flushes the queue. It sets the speaking attribute to false and the
> >>> >> >paused attribute to true.
> >>> >> >
> >>> >> >
> >>> >> >SpeechSynthesisUtterance attributes
> >>> >> >
> >>> >> >
> >>> >> >[[Note, I used SHOULD here because there may be some race-condition
> >>> >> >edge-cases where it might not be ignored.]]
> >>> >> >
> >>> >> >text attribute:
> >>> >> >The text to be synthesized for this utterance. Changes to this
> >>> >> >attribute after the utterance has been added to the queue (by
> >>> >calling
> >>> >> >the speak
> >>> >> >method) SHOULD be ignored.
> >>> >> >
> >>> >> >lang attribute:
> >>> >> >[no change except to append the following] Changes to this
> >>> >attribute
> >>> >> >after the utterance has been added to the queue (by calling the
> >>> >speak
> >>> >> >method)
> >>> >> >SHOULD be ignored.
> >>> >> >
> >>> >> >serviceURI attribute:
> >>> >> >[no change except to append the following] Changes to this
> >>> >attribute
> >>> >> >after the utterance has been added to the queue (by calling the
> >>> >speak
> >>> >> >method)
> >>> >> >SHOULD be ignored.
> >>> >> >
> >>> >> >speaking attribute:
> >>> >> >This attribute is true if this specific utterance is currently
> >>> >being
> >>> >> >spoken. Specifically if this utterance has begun being spoken and
> >>> >has
> >>> >> >not completed being spoken. This is independent of whether the
> >>> >> >SpeechSynthesis object is in a paused state.
> >>> >> >
> >>> >> >paused attribute:
> >>> >> >This attribute is true if this specific utterance has begun to be
> >>> >> >spoken, but has not completed and the SpeechSynthesis object is in
> >>> >the
> >>> >> >paused state.
> >>> >> >
> >>> >> >ended attribute:
> >>> >> >This attribute is true if this specific utterance has completed
> >>> >being
> >>> >> >spoken.
> >>> >> >
> >>> >> >SpeechSynthesisUtterance events
> >>> >> >
> >>> >> >onstart event:
> >>> >> >Fired when this utterance has begun to be spoken.
> >>> >> >
> >>> >> >onend event:
> >>> >> >Fired when this utterance has completed being spoken.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
> >>> >> ><dmazzoni@google.com>wrote:
> >>> >> >
> >>> >> >> Thanks for proposing definitions.
> >>> >> >>
> >>> >> >> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com
> >
> >>> >> >wrote:
> >>> >> >> > I propose the following definitions for the SpeechSynthesis
> >>> >IDL:
> >>> >> >> >
> >>> >> >> > SpeechSynthesis Attributes
> >>> >> >> >
> >>> >> >> > pending attribute:
> >>> >> >> > This attribute is true if the queue contains any utterances
> >>> >which
> >>> >> >have
> >>> >> >> not
> >>> >> >> > completed playback.
> >>> >> >>
> >>> >> >> I was imagining: This attribute is true if the queue contains any
> >>> >> >> utterances which have not *started* speaking.
> >>> >> >>
> >>> >> >> > speaking attribute:
> >>> >> >> > This attribute is true if playback is in progress.
> >>> >> >>
> >>> >> >> I don't like the word "playback", it doesn't fit when the speech
> >>> >is
> >>> >> >> generated dynamically. How about: This attribute is true if an
> >>> >> >> utterance is being spoken.
> >>> >> >>
> >>> >> >> > paused attribute:
> >>> >> >> >   **** How is this different than (pending && !speaking) ? ****
> >>> >> >>
> >>> >> >> This is true if the speech synthesis system is in a paused state,
> >>> >> >> independent of whether anything is speaking or queued.
> >>> >> >>
> >>> >> >> paused && speaking -> it was paused in the middle of an utterance
> >>> >> >> paused && !speaking -> no utterance is speaking, but if you call
> >>> >> >> speak(), nothing will happen because it's in a paused state.
> >>> >> >>
> >>> >> >> >
> >>> >> >> > SpeechSynthesis Methods
> >>> >> >> >
> >>> >> >> > The speak method
> >>> >> >> > This method appends the utterance to the end of a playback
> >>> >queue.
> >>> >> >If
> >>> >> >> > playback is not in progress, it also begins playback of the
> >>> >next
> >>> >> >item in
> >>> >> >> the
> >>> >> >> > queue.
> >>> >> >>
> >>> >> >> What do you think about rewriting to not use "playback"?
> >>> >> >>
> >>> >> >> Also, my idea was that it would not begin playback if the system
> >>> >is
> >>> >> >in
> >>> >> >> a paused state.
> >>> >> >>
> >>> >> >> > The cancel method
> >>> >> >> > This method removes the first matching utterance (if any) from
> >>> >the
> >>> >> >> playback
> >>> >> >> > queue. If playback is in progress and the utterance removed is
> >>> >> >being
> >>> >> >> played,
> >>> >> >> > playback ceases for the utterance and the next utterance in the
> >>> >> >queue (if
> >>> >> >> > any) begins playing.
> >>> >> >>
> >>> >> >> Do we need to say "first matching"? Each utterance should be a
> >>> >> >> specific object, it should be either in the queue or not.
> >>> >> >>
> >>> >> >> > The pause method
> >>> >> >> > This method pauses the playback mid-utterance. If playback is
> >>> >not
> >>> >> >in
> >>> >> >> > progress, it does nothing.
> >>> >> >>
> >>> >> >> I was assuming that calling it would set the system into a paused
> >>> >> >> state, so that even a subsequent call to speak() would not do
> >>> >> >anything
> >>> >> >> other than enqueue.
> >>> >> >>
> >>> >> >> > The continue method
> >>> >> >> > This method continues the playback at the point in the
> >>> >utterance
> >>> >> >and
> >>> >> >> queue
> >>> >> >> > in which it was paused.  If playback is in progress, it does
> >>> >> >nothing.
> >>> >> >> >
> >>> >> >> > The stop method.
> >>> >> >> > This method stops playback mid-utterance and flushes the queue.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > SpeechSynthesisUtterance attributes
> >>> >> >> >
> >>> >> >> > text attribute:
> >>> >> >> > The text to be synthesized for this utterance. This attribute
> >>> >must
> >>> >> >not be
> >>> >> >> > changed after onstart fires.
> >>> >> >>
> >>> >> >> I'd say: changes to this attribute after the utterance has been
> >>> >> >added
> >>> >> >> to the queue (by calling "speak") will be ignored. OR, we should
> >>> >> >make
> >>> >> >> it a DOM exception to modify it when it's in the speech queue.
> >>> >> >>
> >>> >> >> > paused attribute:
> >>> >> >> > This attribute is true if this specific utterance is in the
> >>> >queue
> >>> >> >and has
> >>> >> >> > not completed playback.
> >>> >> >>
> >>> >> >> I think this should only be true if it has begin speaking but not
> >>> >> >> completed.
> >>> >> >>
> >>> >> >> - Dominic
> >>> >> >>
> >>> >>
> >>> >> --
> >>> >> NOTICE TO RECIPIENT:
> >>> >> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
> >>> >TRANSMISSION,
> >>> >> AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS
> >>> >E-MAIL
> >>> >> IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING
> >>> >OF THIS
> >>> >> E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE
> >>> >ERROR
> >>> >> BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> >>> >THANK YOU
> >>> >> IN ADVANCE FOR YOUR COOPERATION.
> >>> >> Reply to : legal@openstream.com
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>>
> >>> --
> >>> NOTICE TO RECIPIENT:
> >>> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
> >>> TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU
> RECEIVED
> >>> THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR
> >>> COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US
> IMMEDIATELY
> >>> OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR
> >>> SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION.
> >>> Reply to : legal@openstream.com
> >>>
> >>
> >
>
Received on Tuesday, 18 September 2012 19:48:13 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:02:28 UTC