- From: Jerry Carter <jerry@jerrycarter.org>
- Date: Thu, 7 Apr 2011 20:26:40 -0400
- To: Olli@pettay.fi
- Cc: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
These proposals are certainly in the right direction, but in recent years, I've tended to favor more general media streams such as that proposed here by the Device APIs and Policy Working Group. <http://dev.w3.org/2009/dap/camera/#captureparam> Most typically, I would expect only the audio information to be sent and consumed. Unlike the telephony case, automobiles and mobile devices can often provide audio from multiple microphones which allows for better noise rejection and, more rarely, assists with speaker identification in multi-speaker contexts. There is certainly value to the video stream, when available, for correlation with facial features. Various studies have shown reduced error rates from combining facial content with the audio. And for speaker identification / verification, the advantages of video over audio-only are clear. Again, most typically, I expect audio information from a single microphone. But I would not want to exclude richer data sources when available. -=- Jerry On Apr 7, 2011, at 3:19 PM, Olli Pettay wrote: > Hi, > > as the last or almost last comment in the conf call there was something > about microphone API. > > As Dan mentioned there has been lots of work happening in > RTC and related areas. HTML spec (WhatWG) has now a > proposal for audio/video conferencing, but there > are also other proposals. > One about audio handling (not about communication) is > https://wiki.mozilla.org/MediaStreamAPI > > For handling audio and video it seems that all > the proposals are using some kind of Stream object. > So, if the recognizer API was using a Stream as an input, > we wouldn't need to care microphone API. This approach would > also let us rely on the other specs to handle many > security and privacy related issues. > (Of course we'd need to choose which Stream API to use, but > that is more broad problem atm. Browsers will need to implement just > one API, but what that will look like exactly isn't clear yet.) > > The API could be, for example, close to SpeechRequest/SpeechRecognizer, > but instead of using the default microphone, or CaptureAPI, > there could be an attribute for the Stream. > > [Constructor(in optional DOMString recognizerURI, > in optional DOMString recognizerParams)] > interface Recognizer { > attribute Stream input; > .... > > This would allow using all sorts of audio streams, not only microphone. > (For example for Streams from other users via VoIP/RTC, or > audio from a video so that web app could do automatic subtitling. > I know, these examples are something for the future.) > > > > -Olli > >
Received on Friday, 8 April 2011 00:27:24 UTC