Re: [HTML Speech] Text to speech from Bjorn Bringert on 2010-09-10 (public-xg-htmlspeech@w3.org from September 2010)

From: Bjorn Bringert <bringert@google.com>
Date: Fri, 10 Sep 2010 13:54:44 +0100
To: "E.J. Zufelt" <lists@zufelt.ca>
Cc: Dave Burke <daveburke@google.com>, david bolter <david.bolter@gmail.com>, Olli@pettay.fi, public-xg-htmlspeech@w3.org, Marc Schroeder <marc.schroeder@dfki.de>
Message-ID: <AANLkTims3M6qT8=GbLMFQfLzV-cjkY6s5QBgv-f_dPa0@mail.gmail.com>

I don't think that using an HTML element or a pure JS API makes much
difference to the functionality (except that HTML makes it easier to
show a control UI). In games you often want to play sound effects, a
use case that is very similar to the single-word TTS that you mention,
and HTML5 audio works for that. For example, to play a single word,
you can use JavaScript to create a <tts> element, set its value to the
word, and call play(). In fact, you can see the HTML API as a
JavaScript API if you want, since all aspects of it are accessible to
scripts.

/Bjorn

On Fri, Sep 10, 2010 at 1:03 PM, E.J. Zufelt <lists@zufelt.ca> wrote:
> Good morning Dave,
> As a blind developer I depend on TTS every day.  I would say that above all
> else that TTS should provide the user, or developer, with finer granularity
> in control over the text segments being synthesized.
> As an example, as I type this message each character is being read aloud by
> my screen reader.  I can have single characters, words, sentences, lines,
> and paragraphs synthesized.
> This means that a single media stream, or even multiple sequential streams,
> cannot provide this degree of granular access to the speech.  A user may
> wish to have the last word in a sentence, and only that word read aloud.
>  Not understanding the word the user may wish to have the word spelled
> aloud.  Depending on implementation the user may wish to have the word
> spelled aloud phonetically.
> This is just one way in which using an audio file, or media stream would
> provide a poorer experience for developers / usrs.
> Using the TTS API concept the developer could call tts.speak() and pass a
> string literal, variable or element id and.  This means that the text string
> will be available to the tts engine.  UAs could then implement some form of
> control to allow the user to have the last segment of speech read again,
> read by character, read phonetically, copied to the clipboard, etc.
> HTH,
> Everett Zufelt
> http://zufelt.ca
> Follow me on Twitter
> http://twitter.com/ezufelt
>
> View my LinkedIn Profile
> http://www.linkedin.com/in/ezufelt
>
>
> On 2010-09-10, at 6:19 AM, Dave Burke wrote:
>
> Perhaps someone can explain why TTS is so different to other media (such as
> pre-recorded audio) that it warrants a fork from HTML 5's standard mechanism
> and justifies a separate JS API? I think the case for HTMLMediaElement is
> very compelling for the reasons Bjorn stated.
> Dave
> On Fri, Sep 10, 2010 at 2:32 AM, david bolter <david.bolter@gmail.com>
> wrote:
>>
>> Hi.
>> I tend to agree with Olli here. It seems more straightforward to me to
>> provide programmatic tts API which would give web devs a lot of control over
>> presentation. I confess though that I've never used a declarative based tts
>> library.
>>
>> In any event I think js devs would like a tts api.
>>
>> Cheers,
>> David
>>
>> On Sep 9, 2010 4:20 PM, "Bjorn Bringert" <bringert@google.com> wrote:
>>
>> On Thu, Sep 9, 2010 at 9:06 PM, Olli Pettay <Olli.Pettay@helsinki.fi>
>> wrote:
>> > On 09/09/2010 06:56 P...
>> Sure, that would work too. But why introduce new APIs when
>> HTMLMediaElement has pretty much all that's needed? Adding a JS API
>> would require adding all the methods for playing, pausing, looping,
>> autobuffering, getting events, changing source etc. HTMLMediaElement
>> already has APIs for all that. It really just boils done to the choice
>> between HTML and JavaScript I guess, and adding a <tts> element seemed
>> most in line with HTML5.
>>
>> Also, HTMLMediaElement allows showing UI controls by just setting an
>> attribute. If it were solely a JavaScript API, web app developers
>> would have to build their own control UIs.
>>
>>
>>
>> >>> I think<audio>  is suboptimal even for server-side TTS, for the
>> >>> following
>> >>> reasons/requirem...
>>
>> --
>> Bjorn Bringert
>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>> Palace Road, ...
>>
>
>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902

Received on Friday, 10 September 2010 12:55:16 UTC