- From: Bjorn Bringert <bringert@google.com>
- Date: Fri, 5 Nov 2010 11:52:15 +0100
- To: "Young, Milan" <Milan.Young@nuance.com>
- Cc: public-xg-htmlspeech@w3.org
- Message-ID: <AANLkTik3EeDHWGHto1Jy_H953XAdf-Z7EpnLAqccBgPX@mail.gmail.com>
Excellent questions Milan, thanks. * It is possible to start playback at a given timestamp by setting the currentTime attribute of the HTMLTtsElement (like with <audio>). There is no way to seek to an SSML <mark>, but there definitely should be one. I think that it would be as simple as making the lastMark attribute settable, and specifying that that will seek to the specified mark. * It could also be useful to allow starting from a given point without scripting by allowing a URI fragment in the src attribute value. This would allow the user agent to buffer from the correct point when autobuffer is set. * The existing events and methods should be enough to handle synchronization with speech recognition, if the speech recognition API exposes enough events. Examples: - For barge-in, the web app would call the HTMLTtsElement pause() method when it receives the "speech started" event from the speech recognition API. - For prompts without barge-in, the web app could start speech recognition when it receives the "ended" event from the HTMLTtsElement. * Mixing of audio from simultaneous <tts>, <audio> and <video> playback appears to be allowed by HTML5 implementations. I can't find that the HTML5 spec explicitly requires it, but it seems to be implied by the specification of how each instance of the elements should work. /Bjorn On Fri, Nov 5, 2010 at 9:14 AM, Young, Milan <Milan.Young@nuance.com> wrote: > Hello Bjorn, > > I have a couple questions about your proposal. As HTML is not my native > tongue, please pardon my ignorance if these issues are already addressed > by the DOM framework. > > * Is there a means for clients to request playback at some timestamp > or <mark> into the request? > > * How do you envision clients synchronizing playback requests with > recognition (e.g. to handle cases like barge-in)? > > * Would <tts> requests overlay other audio sources (e.g. video, > <audio>, or <tts>)? > > > Thank you > > > -----Original Message----- > From: public-xg-htmlspeech-request@w3.org > [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn Bringert > Sent: Thursday, November 04, 2010 4:09 PM > To: public-xg-htmlspeech@w3.org > Subject: Text to Speech proposal > > I have attached a proposal for how we could add TTS support to HTML by > introducing a <tts> element that shares a lot of functionality with > <audio>. > > This is based on an earlier version that I linked to in a thread in > September > (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0018.h > tml). > The main differences are in examples, clarifications, formatting, and > handling of SSML <mark> elements. > > -- > Bjorn Bringert > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham > Palace Road, London, SW1W 9TQ > Registered in England Number: 3977902 > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Friday, 5 November 2010 10:52:44 UTC