Re: [HTML Speech] Text to speech from E.J. Zufelt on 2010-09-10 (public-xg-htmlspeech@w3.org from September 2010)

From: E.J. Zufelt <lists@zufelt.ca>
Date: Fri, 10 Sep 2010 08:03:50 -0400
To: Dave Burke <daveburke@google.com>
Cc: david bolter <david.bolter@gmail.com>, Bjorn Bringert <bringert@google.com>, Olli@pettay.fi, public-xg-htmlspeech@w3.org, Marc Schroeder <marc.schroeder@dfki.de>
Message-Id: <1B313C83-96C3-494B-BE7B-8F940FDD59AF@zufelt.ca>

Good morning Dave,

As a blind developer I depend on TTS every day.  I would say that above all else that TTS should provide the user, or developer, with finer granularity in control over the text segments being synthesized.

As an example, as I type this message each character is being read aloud by my screen reader.  I can have single characters, words, sentences, lines, and paragraphs synthesized.

This means that a single media stream, or even multiple sequential streams, cannot provide this degree of granular access to the speech.  A user may wish to have the last word in a sentence, and only that word read aloud.  Not understanding the word the user may wish to have the word spelled aloud.  Depending on implementation the user may wish to have the word spelled aloud phonetically.

This is just one way in which using an audio file, or media stream would provide a poorer experience for developers / usrs.

Using the TTS API concept the developer could call tts.speak() and pass a string literal, variable or element id and.  This means that the text string will be available to the tts engine.  UAs could then implement some form of control to allow the user to have the last segment of speech read again, read by character, read phonetically, copied to the clipboard, etc.   

HTH,
Everett Zufelt
http://zufelt.ca

Follow me on Twitter
http://twitter.com/ezufelt

View my LinkedIn Profile
http://www.linkedin.com/in/ezufelt

On 2010-09-10, at 6:19 AM, Dave Burke wrote:

> Perhaps someone can explain why TTS is so different to other media (such as pre-recorded audio) that it warrants a fork from HTML 5's standard mechanism and justifies a separate JS API? I think the case for HTMLMediaElement is very compelling for the reasons Bjorn stated.
> 
> Dave
> 
> On Fri, Sep 10, 2010 at 2:32 AM, david bolter <david.bolter@gmail.com> wrote:
> Hi.
> I tend to agree with Olli here. It seems more straightforward to me to provide programmatic tts API which would give web devs a lot of control over presentation. I confess though that I've never used a declarative based tts library.
> 
> In any event I think js devs would like a tts api.
> 
> Cheers,
> David
> 
> 
>> On Sep 9, 2010 4:20 PM, "Bjorn Bringert" <bringert@google.com> wrote:
>> 
>> On Thu, Sep 9, 2010 at 9:06 PM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:
>> > On 09/09/2010 06:56 P...
>> 
>> Sure, that would work too. But why introduce new APIs when
>> HTMLMediaElement has pretty much all that's needed? Adding a JS API
>> would require adding all the methods for playing, pausing, looping,
>> autobuffering, getting events, changing source etc. HTMLMediaElement
>> already has APIs for all that. It really just boils done to the choice
>> between HTML and JavaScript I guess, and adding a <tts> element seemed
>> most in line with HTML5.
>> 
>> Also, HTMLMediaElement allows showing UI controls by just setting an
>> attribute. If it were solely a JavaScript API, web app developers
>> would have to build their own control UIs.
>> 
>> 
>> 
>> >>> I think<audio>  is suboptimal even for server-side TTS, for the following
>> >>> reasons/requirem...
>> 
>> 
>> -- 
>> Bjorn Bringert
>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>> Palace Road, ...
>> 
> 
>

Received on Friday, 10 September 2010 12:53:39 UTC