- From: Peter Beverloo <beverloo@google.com>
- Date: Thu, 19 Jul 2012 15:38:14 +0100
- To: public-speech-api@w3.org
- Message-ID: <CALt3x6mBdV_prNfB6bDig47c-DCXAgKBP+_h2Y8=Pz4M-JXQDg@mail.gmail.com>
With all major browser vendors being members of the WebRTC working group, it may actually be worth considering to slim down the APIs and re-use the interface they'll provide. As an addendum to the quoted proposal: * Drop the "start", "stop" and "abort" methods from the SpeechRecognition object in favor of an input MediaStream acquired through getUserMedia()[1]. Alternatively, the three methods could be re-purposed allowing partial/timed recognition in case of continuous media streams, rather than the whole stream. Best, Peter [1] http://dev.w3.org/2011/webrtc/editor/getusermedia.html#navigatorusermedia On Wed, Jun 13, 2012 at 3:49 PM, Peter Beverloo <beverloo@google.com> wrote: > Currently, the SpeechRecognition[1] interface defines three methods to > start, stop or abort speech recognition, the source of which will be an > audio input device as controlled by the user agent. Similarly, the > TextToSpeech (TTS) interface defines play, pause and stop, which will > output the generated speech to an output device, again, as controlled by > the user agent. > > There are various other media and interaction APIs in development right > now, and I believe it would be good for the Speech API to more tightly > integrate with them. In this e-mail, I'd like to focus on some additional > features for integration with WebRTC and the Web Audio API. > > ** WebRTC <http://dev.w3.org/2011/webrtc/editor/webrtc.html> > > WebRTC provides the ability to interact with the user's microphone and > camera through the getUserMedia() method. As such, an important use-case is > (video and --) audio chatting between two or more people. Audio is > available through a MediaStream object, which can be re-used to power, for > example, an <audio> element, transmitted to other people through a > peer-to-peer connection, but can also integrate with the Web Audio API > through an Audio Context's createMediaStreamSource() method. > > ** Web Audio API < > https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html> > > The Web Audio API provides the ability to process, analyze, synthesize and > modify audio through JavaScript. It can get its input from media files > through XMLHttpRequest, from media elements such as <audio> and <video> and > from any kind of other system, which includes WebRTC, that is able to > provide an audio-based MediaStream. > > Since speech recognition and synthesis does not have to be limited to live > input from and output to the user, I'd like to present two new use-cases. > > 1) Transcripts for (live) communication. > > While the specification does not mandate a maximum duration of a speech > input stream, this suggestion is most appropriate for implementations > utilizing a local recognizer. Allowing MediaStreams to be used as an input > for a SpeechRecognition object, for example through a new "inputStream" > property as an alternative to the start, stop and abort methods, would > enable authors to supply external input to be recognized. This may include, > but is not limited to, prerecorded audio files and WebRTC live streams, > both from local and remote parties. > > 2) Storing and processing text-to-speech fragments. > > Rather than mandating immediate output of the synthesized audio stream, it > should be considered to introduce an "outputStream" property on a > TextToSpeech object which provides a MediaStream object. This allows the > synthesized stream to be played through the <audio> element, processed > through the Web Audio API or even to be stored locally for caching, in case > the user is using a device which is not always connected to the internet > (and when no local recognizer is available). Furthermore, this would allow > websites to store the synthesized audio to a wave file and save this on the > server, allowing it to be re-used for user agents or other clients which do > not provide an implementation. > > The Web platform gains its power by the ability to combine technologies, > and I think it would be great to see the Speech API playing a role in that. > > Best, > Peter > > [1] > http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-section >
Received on Thursday, 19 July 2012 14:38:49 UTC