- From: Dominic Mazzoni <dmazzoni@google.com>
- Date: Thu, 19 Jul 2012 10:38:56 -0700
- To: Peter Beverloo <beverloo@google.com>
- Cc: public-speech-api@w3.org
- Message-ID: <CAFz-FYzKXPYNdN=2DDmPs6nBPC1=JbB_O5LShw5WEj8osPw3Zg@mail.gmail.com>
I love the idea of standardizing interfaces and combining technologies. I only have one potential concern, which is that the license for some text-to-speech engines specifically forbids saving the audio data and playing it back later - or it only allows it for personal use; for example it's not allowed to stream the speech to multiple users without purchasing a more expensive license. - Dominic On Wed, Jun 13, 2012 at 7:49 AM, Peter Beverloo <beverloo@google.com> wrote: > 2) Storing and processing text-to-speech fragments. > > Rather than mandating immediate output of the synthesized audio stream, it > should be considered to introduce an "outputStream" property on a > TextToSpeech object which provides a MediaStream object. This allows the > synthesized stream to be played through the <audio> element, processed > through the Web Audio API or even to be stored locally for caching, in case > the user is using a device which is not always connected to the internet > (and when no local recognizer is available). Furthermore, this would allow > websites to store the synthesized audio to a wave file and save this on the > server, allowing it to be re-used for user agents or other clients which do > not provide an implementation. > > The Web platform gains its power by the ability to combine technologies, > and I think it would be great to see the Speech API playing a role in that. > > Best, > Peter > > [1] > http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-section >
Received on Thursday, 19 July 2012 17:39:24 UTC