- From: Glen Shires <gshires@google.com>
- Date: Tue, 18 Sep 2012 14:28:40 -0700
- To: Jim Barnett <Jim.Barnett@genesyslab.com>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, Adam Sobieski <adamsobieski@hotmail.com>, Peter Beverloo <beverloo@google.com>, public-speech-api@w3.org
- Message-ID: <CAEE5bcjGpVkDJxQ92xZ54O0f9zBkbNz9SBNHYNY1AAB3nFgCqg@mail.gmail.com>
I've updated the spec with editor notes for open issues: grammar formats, confidence and WebRTC. https://dvcs.w3.org/hg/speech-api/rev/f3777d4b107e Milan, regarding an update the status of the open issue list, I believe the spec now contains an editor note for each. As always, the current draft spec is at: http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html /Glen Shires On Tue, Sep 18, 2012 at 5:30 AM, Jim Barnett <Jim.Barnett@genesyslab.com>wrote: > Yes, it’s certainly too late for this draft. As I recall, we have a ‘url’ > property somewhere that it supposed to allow the app to specify the > location of the remote recognizer. Maybe we could put in a note along the > lines of: [Need to specify how access to remote recognition is going to > work]**** > > ** ** > > **- **Jim**** > > ** ** > > *From:* Young, Milan [mailto:Milan.Young@nuance.com] > *Sent:* Monday, September 17, 2012 6:54 PM > *To:* Jim Barnett; Adam Sobieski; Peter Beverloo > > *Cc:* public-speech-api@w3.org > *Subject:* RE: Interacting with WebRTC, the Web Audio API and other > external sources**** > > ** ** > > Transport problems aside, the idea of integrating with getUserMedia() is a > good one. In addition to bringing unity across standards, we are able to > push a good deal of the privacy/consent problems to that subgroup who are > more focused on that task.**** > > ** ** > > Unfortunately, I think it’s a bit too late for this effort to consider > such a relatively major rewrite. I suggest that we add this to the issue > lists that Glen and Hans have offered to compile [1].**** > > ** ** > > Speaking of that, I’d appreciate an update on how that is going. It’s > been over a month now. *Glen* and *Hans*, any progress to report?**** > > ** ** > > [1] > http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0026.html ** > ** > > ** ** > > ** ** > > ** ** > > *From:* Jim Barnett [mailto:Jim.Barnett@genesyslab.com<Jim.Barnett@genesyslab.com>] > > *Sent:* Monday, September 17, 2012 5:49 AM > *To:* Adam Sobieski; Peter Beverloo > *Cc:* public-speech-api@w3.org > *Subject:* RE: Interacting with WebRTC, the Web Audio API and other > external sources**** > > ** ** > > Adam,**** > > I’m participating in the WebRTC work and hope that it can be made useful > to the SpeechAPI. One problem is that WebRTC relies on UDP, while I > understand from Milan that recognizers do better with TCP. I don’t know if > we’ll be able to add TCP to the WebRTC work. If not, at least we will make > sure that it is possible for the application to access the user’s speech > input. It can then construct its own socket and transmit them to the ASR > engine, if necessary.**** > > ** ** > > **- **Jim**** > > ** ** > > *From:* Adam Sobieski [mailto:adamsobieski@hotmail.com] > *Sent:* Saturday, September 15, 2012 5:36 PM > *To:* Peter Beverloo > *Cc:* public-speech-api@w3.org > *Subject:* RE: Interacting with WebRTC, the Web Audio API and other > external sources**** > > ** ** > > Speech API Community Group, > > Greetings. I was reading about recent developments with regard to the > WebRTC stack and I wanted to express that, as the WebRTC stack will be > available upcoming, that it could be useful to the Speech API. WebRTC > includes MediaStream, DataChannel, and PeerConnection interfaces. > > In addition to video calls and video conferencing are possible video > forums and scenarios with streaming content to media repository services. > Some technologists are additionally excited about 3D video and microphone > array functionality. > > Speech recognition can facilitate numerous technologies and features > including generating hypertext transcripts, computers as teleprompters, and > other human-computer interaction and user interface topics pertaining to > web-based multimedia blogging. > > > > Kind regards, > > Adam Sobieski > **** > ------------------------------ > > Date: Thu, 19 Jul 2012 15:38:14 +0100 > From: beverloo@google.com > To: public-speech-api@w3.org > Subject: Re: Interacting with WebRTC, the Web Audio API and other external > sources > > With all major browser vendors being members of the WebRTC working group, > it may actually be worth considering to slim down the APIs and re-use the > interface they'll provide.**** > > ** ** > > As an addendum to the quoted proposal:**** > > ** ** > > * Drop the "start", "stop" and "abort" methods from the SpeechRecognition > object in favor of an input MediaStream acquired through getUserMedia()[1]. > **** > > ** ** > > Alternatively, the three methods could be re-purposed allowing > partial/timed recognition in case of continuous media streams, rather than > the whole stream.**** > > ** ** > > Best,**** > > Peter**** > > ** ** > > [1] > http://dev.w3.org/2011/webrtc/editor/getusermedia.html#navigatorusermedia* > *** > > ** ** > > On Wed, Jun 13, 2012 at 3:49 PM, Peter Beverloo <beverloo@google.com> > wrote:**** > > Currently, the SpeechRecognition[1] interface defines three methods to > start, stop or abort speech recognition, the source of which will be an > audio input device as controlled by the user agent. Similarly, the > TextToSpeech (TTS) interface defines play, pause and stop, which will > output the generated speech to an output device, again, as controlled by > the user agent.**** > > ** ** > > There are various other media and interaction APIs in development right > now, and I believe it would be good for the Speech API to more tightly > integrate with them. In this e-mail, I'd like to focus on some additional > features for integration with WebRTC and the Web Audio API.**** > > ** ** > > ** WebRTC <http://dev.w3.org/2011/webrtc/editor/webrtc.html>**** > > ** ** > > WebRTC provides the ability to interact with the user's microphone and > camera through the getUserMedia() method. As such, an important use-case is > (video and --) audio chatting between two or more people. Audio is > available through a MediaStream object, which can be re-used to power, for > example, an <audio> element, transmitted to other people through a > peer-to-peer connection, but can also integrate with the Web Audio API > through an Audio Context's createMediaStreamSource() method. **** > > ** ** > > ** Web Audio API < > https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html>**** > > ** ** > > The Web Audio API provides the ability to process, analyze, synthesize and > modify audio through JavaScript. It can get its input from media files > through XMLHttpRequest, from media elements such as <audio> and <video> and > from any kind of other system, which includes WebRTC, that is able to > provide an audio-based MediaStream.**** > > ** ** > > Since speech recognition and synthesis does not have to be limited to live > input from and output to the user, I'd like to present two new use-cases.* > *** > > ** ** > > 1) Transcripts for (live) communication.**** > > ** ** > > While the specification does not mandate a maximum duration of a speech > input stream, this suggestion is most appropriate for implementations > utilizing a local recognizer. Allowing MediaStreams to be used as an input > for a SpeechRecognition object, for example through a new "inputStream" > property as an alternative to the start, stop and abort methods, would > enable authors to supply external input to be recognized. This may include, > but is not limited to, prerecorded audio files and WebRTC live streams, > both from local and remote parties.**** > > ** ** > > 2) Storing and processing text-to-speech fragments.**** > > ** ** > > Rather than mandating immediate output of the synthesized audio stream, it > should be considered to introduce an "outputStream" property on a > TextToSpeech object which provides a MediaStream object. This allows the > synthesized stream to be played through the <audio> element, processed > through the Web Audio API or even to be stored locally for caching, in case > the user is using a device which is not always connected to the internet > (and when no local recognizer is available). Furthermore, this would allow > websites to store the synthesized audio to a wave file and save this on the > server, allowing it to be re-used for user agents or other clients which do > not provide an implementation.**** > > ** ** > > The Web platform gains its power by the ability to combine technologies, > and I think it would be great to see the Speech API playing a role in that. > **** > > ** ** > > Best,**** > > Peter**** > > ** ** > > [1] > http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-section > **** > > ** ** >
Received on Tuesday, 18 September 2012 21:29:48 UTC