Re: Text to Speech proposal from Bjorn Bringert on 2010-11-05 (public-xg-htmlspeech@w3.org from November 2010)

From: Bjorn Bringert <bringert@google.com>
Date: Fri, 5 Nov 2010 11:52:15 +0100
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: public-xg-htmlspeech@w3.org
Message-ID: <AANLkTik3EeDHWGHto1Jy_H953XAdf-Z7EpnLAqccBgPX@mail.gmail.com>

Excellent questions Milan, thanks.

* It is possible to start playback at a given timestamp by setting the
currentTime attribute of the HTMLTtsElement (like with <audio>). There is no
way to seek to an SSML <mark>, but there definitely should be one. I think
that it would be as simple as making the lastMark attribute settable, and
specifying that that will seek to the specified mark.

* It could also be useful to allow starting from a given point without
scripting by allowing a URI fragment in the src attribute value. This would
allow the user agent to buffer from the correct point when autobuffer is
set.

* The existing events and methods should be enough to handle synchronization
with speech recognition, if the speech recognition API exposes enough
events. Examples:
  - For barge-in, the web app would call the HTMLTtsElement pause() method
when it receives the "speech started" event from the speech recognition API.
  - For prompts without barge-in, the web app could start speech recognition
when it receives the "ended" event from the HTMLTtsElement.

* Mixing of audio from simultaneous <tts>, <audio> and <video> playback
appears to be allowed by HTML5 implementations. I can't find that the HTML5
spec explicitly requires it, but it seems to be implied by the specification
of how each instance of the elements should work.

/Bjorn

On Fri, Nov 5, 2010 at 9:14 AM, Young, Milan <Milan.Young@nuance.com> wrote:

> Hello Bjorn,
>
> I have a couple questions about your proposal.  As HTML is not my native
> tongue, please pardon my ignorance if these issues are already addressed
> by the DOM framework.
>
>  * Is there a means for clients to request playback at some timestamp
> or <mark> into the request?
>
>  * How do you envision clients synchronizing playback requests with
> recognition (e.g. to handle cases like barge-in)?
>
>  * Would <tts> requests overlay other audio sources (e.g. video,
> <audio>, or <tts>)?
>
>
> Thank you
>
>
> -----Original Message-----
> From: public-xg-htmlspeech-request@w3.org
> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn Bringert
> Sent: Thursday, November 04, 2010 4:09 PM
> To: public-xg-htmlspeech@w3.org
> Subject: Text to Speech proposal
>
> I have attached a proposal for how we could add TTS support to HTML by
> introducing a <tts> element that shares a lot of functionality with
> <audio>.
>
> This is based on an earlier version that I linked to in a thread in
> September
> (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0018.h
> tml).
> The main differences are in examples, clarifications, formatting, and
> handling of SSML <mark> elements.
>
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
>

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace
Road, London, SW1W 9TQ
Registered in England Number: 3977902

Received on Friday, 5 November 2010 10:52:44 UTC