- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Thu, 07 Apr 2011 17:42:18 -0700
- To: Jerry Carter <jerry@jerrycarter.org>
- CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 04/07/2011 05:26 PM, Jerry Carter wrote: > These proposals are certainly in the right direction, but in recent > years, I've tended to favor more general media streams such as that > proposed here by the Device APIs and Policy Working Group. > > <http://dev.w3.org/2009/dap/camera/#captureparam> So far all the Stream APIs I've seen have been very general data streams. I couldn't find any Stream api in the spec you linked. > > Most typically, I would expect only the audio information to be sent > and consumed. Unlike the telephony case, automobiles and mobile > devices can often provide audio from multiple microphones which > allows for better noise rejection and, more rarely, assists with > speaker identification in multi-speaker contexts. There is certainly > value to the video stream, when available, for correlation with > facial features. Various studies have shown reduced error rates from > combining facial content with the audio. And for speaker > identification / verification, the advantages of video over > audio-only are clear. > > Again, most typically, I expect audio information from a single > microphone. But I would not want to exclude richer data sources when > available. > > -=- Jerry > > > On Apr 7, 2011, at 3:19 PM, Olli Pettay wrote: > >> Hi, >> >> as the last or almost last comment in the conf call there was >> something about microphone API. >> >> As Dan mentioned there has been lots of work happening in RTC and >> related areas. HTML spec (WhatWG) has now a proposal for >> audio/video conferencing, but there are also other proposals. One >> about audio handling (not about communication) is >> https://wiki.mozilla.org/MediaStreamAPI >> >> For handling audio and video it seems that all the proposals are >> using some kind of Stream object. So, if the recognizer API was >> using a Stream as an input, we wouldn't need to care microphone >> API. This approach would also let us rely on the other specs to >> handle many security and privacy related issues. (Of course we'd >> need to choose which Stream API to use, but that is more broad >> problem atm. Browsers will need to implement just one API, but what >> that will look like exactly isn't clear yet.) >> >> The API could be, for example, close to >> SpeechRequest/SpeechRecognizer, but instead of using the default >> microphone, or CaptureAPI, there could be an attribute for the >> Stream. >> >> [Constructor(in optional DOMString recognizerURI, in optional >> DOMString recognizerParams)] interface Recognizer { attribute >> Stream input; .... >> >> This would allow using all sorts of audio streams, not only >> microphone. (For example for Streams from other users via VoIP/RTC, >> or audio from a video so that web app could do automatic >> subtitling. I know, these examples are something for the future.) >> >> >> >> -Olli >> >> > >
Received on Friday, 8 April 2011 00:42:52 UTC