Re: Text to Speech proposal

On 11/09/2010 01:00 PM, Bjorn Bringert wrote:
> + crogers for the Web Audio API
>
> It seems like HTML5 does not require simultaneous playback of media
> elements. This is one of the features that the W3C Audio Incubator
> group is trying to address, see e.g.
> http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html

Note, that is just a Google's proposal.
I'd expect the final specification to be somehow a mix of
Mozilla's lower level audio API and AudioNode API.

>
> One option is that we add a TTS media element as proposed (in
> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Nov/att-0036/htmltts-draft.html),
> and rely on the Web Audio API for mixing audio from multiple sources.
> Another option would be to instead add TTS directly in the Web Audio
> API, but I think that would complicate simple TTS use cases.
Yeah, and it might make the Audio API too complicated.

I think re-using <audio> or adding some simple new API
for TTS would make more sense.

-Olli


>
> /Bjorn
>
> On Mon, Nov 8, 2010 at 4:10 PM, T.V Raman<raman@google.com>  wrote:
>>
>> Bjorn,
>>
>> It might be valuable to send in a clarification request bug fix
>> to the html5 spec asking them to make it explicit the behavior
>> when multiple audio and video elements are active. It would be
>> good for the spec to clearly state what implementations are
>> already doing for future-proofing things, Bringert writes:
>>   >  Excellent questions Milan, thanks.
>>   >
>>   >  * It is possible to start playback at a given timestamp by setting the currentTime attribute of the HTMLTtsElement (like with<audio>). There is no way to
>>   >  seek to an SSML<mark>, but there definitely should be one. I think that it would be as simple as making the lastMark attribute settable, and specifying that
>>   >  that will seek to the specified mark.
>>   >
>>   >  * It could also be useful to allow starting from a given point without scripting by allowing a URI fragment in the src attribute value. This would allow the
>>   >  user agent to buffer from the correct point when autobuffer is set.
>>   >
>>   >  * The existing events and methods should be enough to handle synchronization with speech recognition, if the speech recognition API exposes enough events.
>>   >  Examples:
>>   >     - For barge-in, the web app would call the HTMLTtsElement pause() method when it receives the "speech started" event from the speech recognition API.
>>   >     - For prompts without barge-in, the web app could start speech recognition when it receives the "ended" event from the HTMLTtsElement.
>>   >
>>   >  * Mixing of audio from simultaneous<tts>,<audio>  and<video>  playback appears to be allowed by HTML5 implementations. I can't find that the HTML5 spec
>>   >  explicitly requires it, but it seems to be implied by the specification of how each instance of the elements should work.
>>   >
>>   >  /Bjorn
>>   >
>>   >  On Fri, Nov 5, 2010 at 9:14 AM, Young, Milan<Milan.Young@nuance.com>  wrote:
>>   >
>>   >       Hello Bjorn,
>>   >
>>   >       I have a couple questions about your proposal.  As HTML is not my native
>>   >       tongue, please pardon my ignorance if these issues are already addressed
>>   >       by the DOM framework.
>>   >
>>   >        * Is there a means for clients to request playback at some timestamp
>>   >       or<mark>  into the request?
>>   >
>>   >        * How do you envision clients synchronizing playback requests with
>>   >       recognition (e.g. to handle cases like barge-in)?
>>   >
>>   >        * Would<tts>  requests overlay other audio sources (e.g. video,
>>   >       <audio>, or<tts>)?
>>   >
>>   >       Thank you
>>   >
>>   >       -----Original Message-----
>>   >       From: public-xg-htmlspeech-request@w3.org
>>   >       [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn Bringert
>>   >       Sent: Thursday, November 04, 2010 4:09 PM
>>   >       To: public-xg-htmlspeech@w3.org
>>   >       Subject: Text to Speech proposal
>>   >
>>   >       I have attached a proposal for how we could add TTS support to HTML by
>>   >       introducing a<tts>  element that shares a lot of functionality with
>>   >       <audio>.
>>   >
>>   >       This is based on an earlier version that I linked to in a thread in
>>   >       September
>>   >       (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0018.h
>>   >       tml).
>>   >       The main differences are in examples, clarifications, formatting, and
>>   >       handling of SSML<mark>  elements.
>>   >
>>   >       --
>>   >       Bjorn Bringert
>>   >       Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>>   >       Palace Road, London, SW1W 9TQ
>>   >       Registered in England Number: 3977902
>>   >
>>   >  --
>>   >  Bjorn Bringert
>>   >  Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ
>>   >  Registered in England Number: 3977902
>>   >
>>
>> --
>>
>
>
>

Received on Tuesday, 9 November 2010 12:03:17 UTC