Re: Text to Speech proposal from T.V Raman on 2010-11-08 (public-xg-htmlspeech@w3.org from November 2010)

From: T.V Raman <raman@google.com>
Date: Mon, 8 Nov 2010 08:10:18 -0800
To: bringert@google.com
Cc: Milan.Young@nuance.com, public-xg-htmlspeech@w3.org
Message-ID: <19672.8426.561694.99677@retriever.mtv.corp.google.com>
Bjorn,

It might be valuable to send in a clarification request bug fix
to the html5 spec asking them to make it explicit the behavior
when multiple audio and video elements are active. It would be
good for the spec to clearly state what implementations are
already doing for future-proofing things, Bringert writes:
 > Excellent questions Milan, thanks.
 > 
 > * It is possible to start playback at a given timestamp by setting the currentTime attribute of the HTMLTtsElement (like with <audio>). There is no way to
 > seek to an SSML <mark>, but there definitely should be one. I think that it would be as simple as making the lastMark attribute settable, and specifying that
 > that will seek to the specified mark.
 > 
 > * It could also be useful to allow starting from a given point without scripting by allowing a URI fragment in the src attribute value. This would allow the
 > user agent to buffer from the correct point when autobuffer is set.
 > 
 > * The existing events and methods should be enough to handle synchronization with speech recognition, if the speech recognition API exposes enough events.
 > Examples:
 >   - For barge-in, the web app would call the HTMLTtsElement pause() method when it receives the "speech started" event from the speech recognition API.
 >   - For prompts without barge-in, the web app could start speech recognition when it receives the "ended" event from the HTMLTtsElement.
 > 
 > * Mixing of audio from simultaneous <tts>, <audio> and <video> playback appears to be allowed by HTML5 implementations. I can't find that the HTML5 spec
 > explicitly requires it, but it seems to be implied by the specification of how each instance of the elements should work.
 > 
 > /Bjorn
 > 
 > On Fri, Nov 5, 2010 at 9:14 AM, Young, Milan <Milan.Young@nuance.com> wrote:
 > 
 >     Hello Bjorn,
 >    
 >     I have a couple questions about your proposal.  As HTML is not my native
 >     tongue, please pardon my ignorance if these issues are already addressed
 >     by the DOM framework.
 >    
 >      * Is there a means for clients to request playback at some timestamp
 >     or <mark> into the request?
 >    
 >      * How do you envision clients synchronizing playback requests with
 >     recognition (e.g. to handle cases like barge-in)?
 >    
 >      * Would <tts> requests overlay other audio sources (e.g. video,
 >     <audio>, or <tts>)?
 > 
 >     Thank you
 > 
 >     -----Original Message-----
 >     From: public-xg-htmlspeech-request@w3.org
 >     [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn Bringert
 >     Sent: Thursday, November 04, 2010 4:09 PM
 >     To: public-xg-htmlspeech@w3.org
 >     Subject: Text to Speech proposal
 >    
 >     I have attached a proposal for how we could add TTS support to HTML by
 >     introducing a <tts> element that shares a lot of functionality with
 >     <audio>.
 >    
 >     This is based on an earlier version that I linked to in a thread in
 >     September
 >     (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0018.h
 >     tml).
 >     The main differences are in examples, clarifications, formatting, and
 >     handling of SSML <mark> elements.
 >    
 >     --
 >     Bjorn Bringert
 >     Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
 >     Palace Road, London, SW1W 9TQ
 >     Registered in England Number: 3977902
 > 
 > --
 > Bjorn Bringert
 > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ
 > Registered in England Number: 3977902
 > 

--
Received on Monday, 8 November 2010 16:10:50 UTC