Re: TTS proposal to split Utterance into its own interface from Gerardo Capiel on 2012-09-19 (public-speech-api@w3.org from September 2012)

From: Gerardo Capiel <gerardoc@benetech.org>
Date: Wed, 19 Sep 2012 19:34:08 +0000
To: Glen Shires <gshires@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>, Nagesh Kharidi <nagesh@openstream.com>, Jim Barnett <Jim.Barnett@genesyslab.com>, Dominic Mazzoni <dmazzoni@google.com>, Hans Wennborg <hwennborg@google.com>, "olli@pettay.fi" <olli@pettay.fi>
Message-ID: <9A186CF4-062D-4449-9451-B29FAE777BD5@benetech.org>
Glen,

This looks good. Thanks!

Can any of the browser developers comment on their implementation plans, so we can plan our adoption plans? I imagine this will also be great for enhancing the accessibility of operating systems, such as Firefox OS.

Gerardo

Gerardo Capiel
VP of Engineering
Benetech

On Sep 19, 2012, at 8:58 AM, "Glen Shires" <gshires@google.com<mailto:gshires@google.com>> wrote:

Gerardo,
Please take a look at, and comment on, the 'onupdate' event proposed in the thread "Proposal to add start, stop, and update events to TTS". [1]  I believe this offers the functionality you describe.

/Glen Shires

[1] http://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0102.html


On Wed, Sep 19, 2012 at 8:43 AM, Gerardo Capiel <gerardoc@benetech.org<mailto:gerardoc@benetech.org>> wrote:
One important requirement to address the needs of users with learning disabilities, such as dyslexia, is to provide the ability to trigger events that enable the developer to "highlight" the words being spoken in order to provide multi-modal content delivery. We are implementing this in a EPUB web reader for Bookshare using Google Chrome's TTS extension API. You can see a video of this functionality in an iOS app on the YouTube video found at:

http://read2go.org/

We also developed a Chrome specific open source library to implement multi-modal speech/highlighting in any app:

https://github.com/benetech/BeneSpeak

Benetech and others want to implement this functionality across all browsers. So I'd like to see functionality that enables to trigger and handle events based on word boundaries or SSML mark elements:

http://msdn.microsoft.com/en-us/library/lync/bb812497(v=office.12).aspx

Thank You,

Gerardo

Gerardo Capiel
VP of Engineering

On Sep 19, 2012, at 8:17 AM, "Glen Shires" <gshires@google.com<mailto:gshires@google.com>> wrote:

I've updated the spec by adding an 'onresume' event and renaming the 'continue' method as 'resume' (no change to definition).
https://dvcs.w3.org/hg/speech-api/rev/253bab5be673

As always, the current draft spec is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

/Glen Shires

On Tue, Sep 18, 2012 at 1:59 PM, Nagesh Kharidi <nagesh@openstream.com<mailto:nagesh@openstream.com>> wrote:
Glen,

Wouldn't placing the onresume code after call to the continue/resume method result in a lot of if/then/else checks to implement logic based on the utterance that was resumed? In addition, we would have to store a reference to the paused utterance in the onpause handler. These drawbacks can be avoided by adding the onresume event.

Regards,
Nagesh

On Sep 18, 2012, at 3:04 PM, Glen Shires wrote:

Nagesh,
Yes, I had originally proposed an onresume event (same functionality as a continue event), but omitted it because I couldn't think of a good use case.  onpause is useful because it indicates which of the utterances was paused when the pause method was called. However, onresume would always be called on that same object in response to calling the continue method. (Thus it seems that whatever code that might be placed in an onresume function could just as easily be placed after the call to the continue method.)  I'm not opposed to adding onresume, but I'd like to see some good use cases for it.
/Glen Shires

On Tue, Sep 18, 2012 at 11:26 AM, Nagesh Kharidi <nagesh@openstream.com<mailto:nagesh@openstream.com>> wrote:
Glen,

Looks good. I propose that we enhance SpeechSynthesisUtterance by adding a continue event (fired when a paused utterance is resumed) and a corresponding oncontinue event handler.

Regards,
Nagesh

On Sep 18, 2012, at 1:25 AM, Glen Shires wrote:

I've updated the spec with the above SpeechSynthesis and SpeechSynthesisUtterance IDL and definitions:
https://dvcs.w3.org/hg/speech-api/rev/b036c78e9445

As always, the current draft spec is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

On Sat, Sep 15, 2012 at 1:31 PM, Glen Shires <gshires@google.com<mailto:gshires@google.com>> wrote:
Nagesh,
I agree that cancelAll() is useful and can make code simpler because it doesn't affect the paused state.  In fact, I propose that we add cancelAll() and remove stop() -- because the stop function is probably less common and can easily be accomplished with two calls: cancelAll() and pause().

Also, since canceling a specific utterance is not very useful, and questionable as Jerry states, I propose eliminating cancel(utterance). If we do that, then we could rename cancelAll() more simply as cancel().

Thus, I propose this IDL:

    interface SpeechSynthesis {
      static readonly attribute boolean pending;
      static readonly attribute boolean speaking;
      static readonly attribute boolean paused;

      static void speak(SpeechSynthesisUtterance utterance);
      static void cancel();
      static void pause();
      static void continue();
    }

and I propose this new definition of cancel:

The cancel method
This method removes all utterances from the queue. If an utterance is being spoken, speaking ceases immediately. This method does not change the paused state of the SpeechSynthesis object.

/Glen Shires


On Sat, Sep 15, 2012 at 3:25 AM, Nagesh Kharidi <nagesh@openstream.com<mailto:nagesh@openstream.com>> wrote:
Please see inline.

Regards,
Nagesh

On Fri, 14 Sep 2012 12:59:41 -0700
 Glen Shires <gshires@google.com<mailto:gshires@google.com>> wrote:
>> Provide the ability to cancel all currently queued utterances.
>
>The stop() method cancels all queued utterances. (Dominic proposed
>that
>this message be named stopAndFlushQueue(), would that name be more
>clear?)

In addition to canceling all queued utterances, the stop() method also
pauses the SpeechSynthesis object. A separate cancelAll() method would
be useful, without which, if a new utterance is to be spoken
immediately, we would have to do :
speechSynthesis.stop();
speechSynthesis.continue();
speechSynthesis.speak(utterance);

With a cancelAll() method, this would be:
speechSynthesis.cancelAll();
speechSynthesis.speak(utterance);

Since this would be such a common usage, we could make it even easier
for developers by either:
- providing a speakImmediate(utterance) method that cancels all queued
utterances and then starts speaking the new utterance
or
- adding a second parameter as follows to the speak() method:
speechSynthesis.speak(utterance, speakImmediately);
If speakImmediately is true, all currently queued utterances will be
canceled and the new utterance will be spoken.

>
>Also, what is the use case for the current cancel(utterance) method?
> In
>all the use cases I envision, you'd want to cancel all queued
>utterances.
>Can we eliminate cancel() ?

I also agree that canceling a specific utterance is not very useful.
Canceling all queued utterances would be more common than canceling a
specific utterance.

>
>
>> New speakNext SpeechSynthesis method - append the utterance to the
>beginning of the queue
>
>I'd like more discussion on this. What are the use cases? What are the
>edge
>cases (e.g. If there's a race-condition, the current utterance may
>finish
>and the second in the queue may begin speaking before this new
>utterance is
>inserted).

Use case for speakNext() method: Consider a news application that plays
the latest news items. It queues all news items to be played. Now if
there is a new "breaking news" item that comes in, the speakNext()
method can be used to play it as soon as possible without canceling the
already queued items.


>
>
>>  Question:  Can a cancelled utterance be re-queued?
>
>Good question, and also, what is the lifetime of a
>SpeechSynthesisUtterance
>object and who owns it. There's at least 3 possibilities:
>
>1. The speak() method takes ownership when it adds it to the queue,
>then it
>would presumably be destroyed upon cancel or onend.
>    (This raises the questions: what usefulness is
>the SpeechSynthesisUtterance object attribute "ended", since the
>object
>will be destroyed when it turns true. It also makes it messy to use
>the
>other readonly attributes because the object may be deleted suddenly.
>Also, what if the author deletes the SpeechSynthesisUtterance object
>prior
>to it being spoken.  One easy way to accidentally create this bug is
>to
>define the SpeechSynthesisUtterance object in a method that goes out
>of
>scope.)
>
>2. The speak() method does not take ownership when it adds it directly
>to
>queue.
>    (This raises the question: what if the author deletes the
>SpeechSynthesisUtterance object prior to it being spoken.  One easy
>way to
>accidentally create this bug is to define the SpeechSynthesisUtterance
>object in a method that goes out of scope.)
>
>3. The speak() method does not take ownership, it makes a copy of it
>when
>it adds it to queue .
>    (This raises the question: how can the author's
>original SpeechSynthesisUtterance object readonly attributes
>(speaking,
>paused, ended) reflect the state of the copy on the queue.)
>
>
>To resolve these issues, I propose the following, because I think it's
>the
>cleanest solution and easiest for authors, since they can create and
>destroy objects, and go out of scope, without worrying about the
>speaking
>queue timing:
>
>The speak() method does not take ownership of the
>SpeechSynthesisUtterance
>object, it makes a copy of it when it adds it to queue.  We eliminate
>the SpeechSynthesisUtterance readonly attributes, relying instead on
>events
>that indicate change in state, including new events for: onpause,
>onresume.
>
>Because it's a copy of the object, this clarifies that:
>- changes to the original SpeechSynthesisUtterance object after
>calling
>speak() do not affect the copy on the queue.
>- the same SpeechSynthesisUtterance object can be used to call speak()
>multiple times, (even after a copy of which was spoken or cancelled).
>
>The new IDL would be:
>
>    interface SpeechSynthesisUtterance {
>      attribute DOMString text;
>      attribute DOMString lang;
>      attribute DOMString serviceURI;
>
>      attribute Function onstart;
>      attribute Function onend;
>*      attribute Function onpause;*
>*      attribute Function onresume;*
>    }
>
>
>And the new definition:
>
>The speak method
>This method appends *a copy of* the utterance to the end of the queue
>for
>this SpeechSynthesis object. It does not change the paused state of
>the
>SpeechSynthesis object.  If the SpeechSynthesis object is paused, it
>remains paused. If it is not paused, then this utterance is spoken if
>no
>other utterances are in the queue, else this utterance is queued to
>begin
>speaking after the other utterances in the queue have been spoken.
>
>
>/Glen Shires
>
>
>On Fri, Sep 14, 2012 at 6:05 AM, Jim Barnett
><Jim.Barnett@genesyslab.com<mailto:Jim.Barnett@genesyslab.com>>wrote:
>
>> I would  think that cancelling all utterances would be the more
>common use
>> case (so we ought to make it easy).  Question:  Can a cancelled
>utterance
>> be re-queued?
>>
>> - Jim
>>
>> -----Original Message-----
>> From: Nagesh Kharidi [mailto:nagesh@openstream.com<mailto:nagesh@openstream.com>]
>> Sent: Friday, September 14, 2012 8:58 AM
>> To: Glen Shires; Dominic Mazzoni
>> Cc: Hans Wennborg; olli@pettay.fi<mailto:olli@pettay.fi>; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
>> Subject: Re: TTS proposal to split Utterance into its own interface
>>
>> I would like to propose the following:
>> 1. Provide the ability to cancel all currently queued utterances. A
>new
>> cancelAll method could be added. Alternately, invoking the cancel
>method
>> without the utterance parameter could imply cancel all utterances.
>>
>> 2. New speakNext SpeechSynthesis method
>> This method will append the utterance to the beginning of the queue.
>>
>> 3. New oncancel SpeechSynthesisUtterance event Fired when the
>utterance is
>> canceled.
>>
>> 4. New canceled SpeechSynthesisUtterance attribute true if the
>utterance
>> is canceled.
>>
>>
>> I also had a question regarding the stop method: Is "flushes the
>queue"
>> equivalent to calling cancel on all utterances in the queue? If so,
>I
>> would like to suggest changing "flushes the queue" to "cancels all
>> utterances in the queue".
>>
>> Regards,
>> Nagesh
>>
>> On Thu, 13 Sep 2012 14:13:56 -0700
>>  Glen Shires <gshires@google.com<mailto:gshires@google.com>> wrote:
>> >Yes, I like the way you've defined the "speak" method to not change
>the
>> >play/pause state. Also, I didn't particularly like the word
>"playback",
>> >so thanks for the alternative "spoken".  Here's updated definitions
>> >with your suggestions incorporated. If there's no disagreement,
>I'll
>> >add them to the spec on Monday.
>> >
>> >
>> >SpeechSynthesis Attributes
>> >
>> >pending attribute:
>> >This attribute is true if the queue for this SpeechSynthesis object
>> >contains any utterances which have not started speaking.
>> >
>> >speaking attribute:
>> >This attribute is true if an utterance is being spoken.
>Specifically if
>> >an utterance has begun being spoken and has not completed being
>spoken,
>> >and is independent of whether this SpeechSynthesis object is in the
>> >paused state.
>> >
>> >paused attribute:
>> >The attribute is true when this SpeechSynthesis object is in the
>paused
>> >state. This state is independent of whether anything is in the
>queue.
>> >The
>> >default state of a new SpeechSynthesis object is the non-paused
>state.
>> >
>> >
>> >SpeechSynthesis Methods
>> >
>> >The speak method
>> >This method appends the utterance to the end of the queue for this
>> >SpeechSynthesis object. It does not change the paused state of the
>> >SpeechSynthesis object.  If the SpeechSynthesis object is paused,
>it
>> >remains paused. If it is not paused, then this utterance is spoken
>if
>> >no other utterances are in the queue, else this utterance is queued
>to
>> >begin speaking after the other utterances in the queue have been
>> >spoken.
>> >
>> >The cancel method
>> >This method removes the specified utterance from the queue. If it
>is
>> >not in the queue, no changes are made. If the utterance removed is
>> >being spoken, speaking ceases for that utterance and the next
>utterance
>> >in the queue (if
>> >any) begins to be spoken. This method does not change the paused
>state
>> >of the SpeechSynthesis object.
>> >
>> >The pause method
>> >This method puts the SpeechSynthesis object into the paused state.
>If
>> >an utterance was being spoken, it pauses mid-utterance. (If called
>when
>> >the SpeechSynthesis object was already in the paused state, it does
>> >nothing.)
>> >
>> >The continue method
>> >This method puts the SpeechSynthesis object into the non-paused
>state.
>> >If
>> >an utterance was speaking (that is, its speaking attribute is
>true), it
>> >continues speaking the utterance at the point at which it was
>paused,
>> >else it begins speaking the next utterance in the queue (if any).
>(If
>> >called when the SpeechSynthesis object was already in the
>non-paused
>> >state, it does nothing.)
>> >
>> >The stop method.
>> >This method puts the SpeechSynthesis object into the paused state
>and
>> >flushes the queue. It sets the speaking attribute to false and the
>> >paused attribute to true.
>> >
>> >
>> >SpeechSynthesisUtterance attributes
>> >
>> >
>> >[[Note, I used SHOULD here because there may be some race-condition
>> >edge-cases where it might not be ignored.]]
>> >
>> >text attribute:
>> >The text to be synthesized for this utterance. Changes to this
>> >attribute after the utterance has been added to the queue (by
>calling
>> >the speak
>> >method) SHOULD be ignored.
>> >
>> >lang attribute:
>> >[no change except to append the following] Changes to this
>attribute
>> >after the utterance has been added to the queue (by calling the
>speak
>> >method)
>> >SHOULD be ignored.
>> >
>> >serviceURI attribute:
>> >[no change except to append the following] Changes to this
>attribute
>> >after the utterance has been added to the queue (by calling the
>speak
>> >method)
>> >SHOULD be ignored.
>> >
>> >speaking attribute:
>> >This attribute is true if this specific utterance is currently
>being
>> >spoken. Specifically if this utterance has begun being spoken and
>has
>> >not completed being spoken. This is independent of whether the
>> >SpeechSynthesis object is in a paused state.
>> >
>> >paused attribute:
>> >This attribute is true if this specific utterance has begun to be
>> >spoken, but has not completed and the SpeechSynthesis object is in
>the
>> >paused state.
>> >
>> >ended attribute:
>> >This attribute is true if this specific utterance has completed
>being
>> >spoken.
>> >
>> >SpeechSynthesisUtterance events
>> >
>> >onstart event:
>> >Fired when this utterance has begun to be spoken.
>> >
>> >onend event:
>> >Fired when this utterance has completed being spoken.
>> >
>> >
>> >
>> >On Thu, Sep 13, 2012 at 10:25 AM, Dominic Mazzoni
>> ><dmazzoni@google.com<mailto:dmazzoni@google.com>>wrote:
>> >
>> >> Thanks for proposing definitions.
>> >>
>> >> On Tue, Sep 11, 2012 at 3:02 AM, Glen Shires <gshires@google.com<mailto:gshires@google.com>>
>> >wrote:
>> >> > I propose the following definitions for the SpeechSynthesis
>IDL:
>> >> >
>> >> > SpeechSynthesis Attributes
>> >> >
>> >> > pending attribute:
>> >> > This attribute is true if the queue contains any utterances
>which
>> >have
>> >> not
>> >> > completed playback.
>> >>
>> >> I was imagining: This attribute is true if the queue contains any
>> >> utterances which have not *started* speaking.
>> >>
>> >> > speaking attribute:
>> >> > This attribute is true if playback is in progress.
>> >>
>> >> I don't like the word "playback", it doesn't fit when the speech
>is
>> >> generated dynamically. How about: This attribute is true if an
>> >> utterance is being spoken.
>> >>
>> >> > paused attribute:
>> >> >   **** How is this different than (pending && !speaking) ? ****
>> >>
>> >> This is true if the speech synthesis system is in a paused state,
>> >> independent of whether anything is speaking or queued.
>> >>
>> >> paused && speaking -> it was paused in the middle of an utterance
>> >> paused && !speaking -> no utterance is speaking, but if you call
>> >> speak(), nothing will happen because it's in a paused state.
>> >>
>> >> >
>> >> > SpeechSynthesis Methods
>> >> >
>> >> > The speak method
>> >> > This method appends the utterance to the end of a playback
>queue.
>> >If
>> >> > playback is not in progress, it also begins playback of the
>next
>> >item in
>> >> the
>> >> > queue.
>> >>
>> >> What do you think about rewriting to not use "playback"?
>> >>
>> >> Also, my idea was that it would not begin playback if the system
>is
>> >in
>> >> a paused state.
>> >>
>> >> > The cancel method
>> >> > This method removes the first matching utterance (if any) from
>the
>> >> playback
>> >> > queue. If playback is in progress and the utterance removed is
>> >being
>> >> played,
>> >> > playback ceases for the utterance and the next utterance in the
>> >queue (if
>> >> > any) begins playing.
>> >>
>> >> Do we need to say "first matching"? Each utterance should be a
>> >> specific object, it should be either in the queue or not.
>> >>
>> >> > The pause method
>> >> > This method pauses the playback mid-utterance. If playback is
>not
>> >in
>> >> > progress, it does nothing.
>> >>
>> >> I was assuming that calling it would set the system into a paused
>> >> state, so that even a subsequent call to speak() would not do
>> >anything
>> >> other than enqueue.
>> >>
>> >> > The continue method
>> >> > This method continues the playback at the point in the
>utterance
>> >and
>> >> queue
>> >> > in which it was paused.  If playback is in progress, it does
>> >nothing.
>> >> >
>> >> > The stop method.
>> >> > This method stops playback mid-utterance and flushes the queue.
>> >> >
>> >> >
>> >> > SpeechSynthesisUtterance attributes
>> >> >
>> >> > text attribute:
>> >> > The text to be synthesized for this utterance. This attribute
>must
>> >not be
>> >> > changed after onstart fires.
>> >>
>> >> I'd say: changes to this attribute after the utterance has been
>> >added
>> >> to the queue (by calling "speak") will be ignored. OR, we should
>> >make
>> >> it a DOM exception to modify it when it's in the speech queue.
>> >>
>> >> > paused attribute:
>> >> > This attribute is true if this specific utterance is in the
>queue
>> >and has
>> >> > not completed playback.
>> >>
>> >> I think this should only be true if it has begin speaking but not
>> >> completed.
>> >>
>> >> - Dominic
>> >>
>>
>> --
>> NOTICE TO RECIPIENT:
>> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
>TRANSMISSION,
>> AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS
>E-MAIL
>> IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING
>OF THIS
>> E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE
>ERROR
>> BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>THANK YOU
>> IN ADVANCE FOR YOUR COOPERATION.
>> Reply to : legal@openstream.com<mailto:legal@openstream.com>
>>
>>
>>
>>

--
NOTICE TO RECIPIENT:
THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION.
Reply to : legal@openstream.com<mailto:legal@openstream.com>






--
NOTICE TO RECIPIENT:  THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION. Reply to : legal@openstream.com<mailto:legal@openstream.com>





--
NOTICE TO RECIPIENT:  THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION. Reply to : legal@openstream.com<mailto:legal@openstream.com>
Received on Wednesday, 19 September 2012 19:34:48 UTC