- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Tue, 09 Nov 2010 14:02:37 +0200
- To: Bjorn Bringert <bringert@google.com>, public-xg-htmlspeech@w3.org
On 11/09/2010 01:00 PM, Bjorn Bringert wrote: > + crogers for the Web Audio API > > It seems like HTML5 does not require simultaneous playback of media > elements. This is one of the features that the W3C Audio Incubator > group is trying to address, see e.g. > http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html Note, that is just a Google's proposal. I'd expect the final specification to be somehow a mix of Mozilla's lower level audio API and AudioNode API. > > One option is that we add a TTS media element as proposed (in > http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Nov/att-0036/htmltts-draft.html), > and rely on the Web Audio API for mixing audio from multiple sources. > Another option would be to instead add TTS directly in the Web Audio > API, but I think that would complicate simple TTS use cases. Yeah, and it might make the Audio API too complicated. I think re-using <audio> or adding some simple new API for TTS would make more sense. -Olli > > /Bjorn > > On Mon, Nov 8, 2010 at 4:10 PM, T.V Raman<raman@google.com> wrote: >> >> Bjorn, >> >> It might be valuable to send in a clarification request bug fix >> to the html5 spec asking them to make it explicit the behavior >> when multiple audio and video elements are active. It would be >> good for the spec to clearly state what implementations are >> already doing for future-proofing things, Bringert writes: >> > Excellent questions Milan, thanks. >> > >> > * It is possible to start playback at a given timestamp by setting the currentTime attribute of the HTMLTtsElement (like with<audio>). There is no way to >> > seek to an SSML<mark>, but there definitely should be one. I think that it would be as simple as making the lastMark attribute settable, and specifying that >> > that will seek to the specified mark. >> > >> > * It could also be useful to allow starting from a given point without scripting by allowing a URI fragment in the src attribute value. This would allow the >> > user agent to buffer from the correct point when autobuffer is set. >> > >> > * The existing events and methods should be enough to handle synchronization with speech recognition, if the speech recognition API exposes enough events. >> > Examples: >> > - For barge-in, the web app would call the HTMLTtsElement pause() method when it receives the "speech started" event from the speech recognition API. >> > - For prompts without barge-in, the web app could start speech recognition when it receives the "ended" event from the HTMLTtsElement. >> > >> > * Mixing of audio from simultaneous<tts>,<audio> and<video> playback appears to be allowed by HTML5 implementations. I can't find that the HTML5 spec >> > explicitly requires it, but it seems to be implied by the specification of how each instance of the elements should work. >> > >> > /Bjorn >> > >> > On Fri, Nov 5, 2010 at 9:14 AM, Young, Milan<Milan.Young@nuance.com> wrote: >> > >> > Hello Bjorn, >> > >> > I have a couple questions about your proposal. As HTML is not my native >> > tongue, please pardon my ignorance if these issues are already addressed >> > by the DOM framework. >> > >> > * Is there a means for clients to request playback at some timestamp >> > or<mark> into the request? >> > >> > * How do you envision clients synchronizing playback requests with >> > recognition (e.g. to handle cases like barge-in)? >> > >> > * Would<tts> requests overlay other audio sources (e.g. video, >> > <audio>, or<tts>)? >> > >> > Thank you >> > >> > -----Original Message----- >> > From: public-xg-htmlspeech-request@w3.org >> > [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn Bringert >> > Sent: Thursday, November 04, 2010 4:09 PM >> > To: public-xg-htmlspeech@w3.org >> > Subject: Text to Speech proposal >> > >> > I have attached a proposal for how we could add TTS support to HTML by >> > introducing a<tts> element that shares a lot of functionality with >> > <audio>. >> > >> > This is based on an earlier version that I linked to in a thread in >> > September >> > (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0018.h >> > tml). >> > The main differences are in examples, clarifications, formatting, and >> > handling of SSML<mark> elements. >> > >> > -- >> > Bjorn Bringert >> > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham >> > Palace Road, London, SW1W 9TQ >> > Registered in England Number: 3977902 >> > >> > -- >> > Bjorn Bringert >> > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ >> > Registered in England Number: 3977902 >> > >> >> -- >> > > >
Received on Tuesday, 9 November 2010 12:03:17 UTC