W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > April 2011

Re: about Microphone API

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Thu, 07 Apr 2011 17:42:18 -0700
Message-ID: <4D9E59EA.8010102@helsinki.fi>
To: Jerry Carter <jerry@jerrycarter.org>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 04/07/2011 05:26 PM, Jerry Carter wrote:
> These proposals are certainly in the right direction, but in recent
> years, I've tended to favor more general media streams such as that
> proposed here by the Device APIs and Policy Working Group.
> <http://dev.w3.org/2009/dap/camera/#captureparam>

So far all the Stream APIs I've seen have been very general data
I couldn't find any Stream api in the spec you linked.

> Most typically, I would expect only the audio information to be sent
> and consumed.  Unlike the telephony case, automobiles and mobile
> devices can often provide audio from multiple microphones which
> allows for better noise rejection and, more rarely, assists with
> speaker identification in multi-speaker contexts.  There is certainly
> value to the video stream, when available, for correlation with
> facial features.  Various studies have shown reduced error rates from
> combining facial content with the audio.  And for speaker
> identification / verification, the advantages of video over
> audio-only are clear.
> Again, most typically, I expect audio information from a single
> microphone.  But I would not want to exclude richer data sources when
> available.
> -=- Jerry
> On Apr 7, 2011, at 3:19 PM, Olli Pettay wrote:
>> Hi,
>> as the last or almost last comment in the conf call there was
>> something about microphone API.
>> As Dan mentioned there has been lots of work happening in RTC and
>> related areas. HTML spec (WhatWG) has now a proposal for
>> audio/video conferencing, but there are also other proposals. One
>> about audio handling (not about communication) is
>> https://wiki.mozilla.org/MediaStreamAPI
>> For handling audio and video it seems that all the proposals are
>> using some kind of Stream object. So, if the recognizer API was
>> using a Stream as an input, we wouldn't need to care microphone
>> API. This approach would also let us rely on the other specs to
>> handle many security and privacy related issues. (Of course we'd
>> need to choose which Stream API to use, but that is more broad
>> problem atm. Browsers will need to implement just one API, but what
>> that will look like exactly isn't clear yet.)
>> The API could be, for example, close to
>> SpeechRequest/SpeechRecognizer, but instead of using the default
>> microphone, or CaptureAPI, there could be an attribute for the
>> Stream.
>> [Constructor(in optional DOMString recognizerURI, in optional
>> DOMString recognizerParams)] interface Recognizer { attribute
>> Stream input; ....
>> This would allow using all sorts of audio streams, not only
>> microphone. (For example for Streams from other users via VoIP/RTC,
>> or audio from a video so that web app could do automatic
>> subtitling. I know, these examples are something for the future.)
>> -Olli
Received on Friday, 8 April 2011 00:42:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:49 UTC