Follow up on Audio Output Device Selection from Philippe Joseph Cohen on 2014-11-04 (public-media-capture@w3.org from November 2014)

From: Philippe Joseph Cohen <philc@audyx.com>
Date: Tue, 4 Nov 2014 20:52:48 +0200
To: public-media-capture@w3.org
Cc: WebAudio/web-audio-api <web-audio-api@noreply.github.com>
Message-Id: <03ABF241-406F-440F-A436-5DD4223BB7CD@audyx.com>
Hello Media Capture folks -

Following the very interesting presentation <https://www.w3.org/wiki/images/d/d6/Output_Device_Selection,_TPAC_2014.pdf> made by Justin Uberti at TPAC on Thursday morning (Oct 30th) during the Media Capture Task Force I wish to contribute to speed up the spec process of this important topic for us (we are building audiology platform) and for many others using the Web. So here is my summary on what has been discussed to get everyone in the loop:

First and most important the participants supported the need to bring a solution to this audio multi-output requirement that is reported as the #1 API related issue  <https://code.google.com/p/webrtc/issues/detail?id=2243> in Chrome WebRTC bug tracker. It has also been discussed in the Audio group and qualified as important during Tuesday meeting with Harald Alvestrand and the decision was to let the gUM task force leading this proposal and to ensure that the Audio API will properly integrate with the Media Capture (aka gUM) updated specs.

Current proposal is to add an optional sinkId property to HTMLMediaElement and to have AudioContext constructor receiving it as an optional parameter and no objections were made (at least as this stage) beside the need to coordinate this with the HTML and Audio WG. The debate focused mainly on the authorisation control that is required to protect users from attacks such as ads blared to powerful speakers connected to the client device and the Justin solution made a distinction between two use cases:

* Composite device use case: where audio output device that have an associated input device like in a headphone and I suggest to call this use case the headphone communication use case. In this case Justin proposed to avoid prompting for additional authorisation and to leverage the authorisation on the input device to use the associated output device as he detailed initially in this Chromium WebRTC bug <https://code.google.com/p/webrtc/issues/detail?id=2243#c30>. 

This associatedSink solution indeed fits many of today communication use cases and does not add new ACL that is really cool. I want however to point out that in more demanding communication use cases such as corporate conference rooms (and generally in video conferences done within large rooms) the client device in the conference room will likely have multiple microphones that won’t be associated to the likely 5.1 speakers output on the wall but will all be independent devices. This ‘corporate conference room’ use case won’t be addressed with the AssociatedSink proposal.

* Advanced audio use case: non composite and non default audio output devices will be accessible per current proposal via an explicit authorisation. 

This use case is perceived by as a much less frequent and advanced use case, and indeed my audiology use cases can be categorised as such and I am glad we’re in agreement that such cases should be supported with the finalised gUM specs. But I want to make the point that more day-to-day consumer use cases fit into this so called advanced use cases
** Parent and kids at home, when the parent is listening to radio through (e.g.)  beats pill while cooking dinner and the kid is watching a movie using his headset. 
** DJ using a Web app to access play lists and to hear in headset the next track while the current one is playing on the big speakers
** Car calls when one passenger is willing to direct the talk through the car speakers so other passengers can listen while the microphone stays on the smartphone. Native mobile apps are supporting this type of audio device selection on iOS and Android for a long time.

The issue in the current proposal is that EnumerateDevice returns opaque information before user authorisation as current drafted in the EnumerateDevices specs <http://w3c.github.io/mediacapture-main/getusermedia.html#access-control-model>. So with this opaque information and  without the the more meaningful properties (currently the ‘label’ one), apps won’t be in a position to decide if to prompt the user for authorisation to use a specific device based on just the opaque deviceId. It has been agreed during the meeting that this issue must be addressed and I will propose through this list a proposition in a separated thread. 

I copy the audio API group to this discussion as well.

Thanks and best regards - Philippe Cohen 

Philippe Cohen | CTO 
CELL +972.52.3454100 | SKYPE: philjosephcohen 
Changing Audiology
Received on Tuesday, 4 November 2014 18:53:21 UTC