[mediacapture-main] Clarify "audiooutput" does not mean capture of audio output to headphones or speakers (#720) from guest271314 via GitHub on 2020-09-07 (public-webrtc@w3.org from September 2020)

From: guest271314 via GitHub <sysbot+gh@w3.org>
Date: Mon, 07 Sep 2020 18:13:02 +0000
To: public-webrtc@w3.org
Message-ID: <issues.opened-695322331-1599502380-sysbot+gh@w3.org>
guest271314 has just created a new issue for https://github.com/w3c/mediacapture-main:

== Clarify "audiooutput" does not mean capture of audio output to headphones or speakers ==
Per Issue 1114422: enumerateDevices() listing device kind "audioouput" is incorrect and misleading https://bugs.chromium.org/p/chromium/issues/detail?id=1114422

> "audiooutput" refers to audio playback via a media element. It does not refer to microphone input or audio capture of anything. 

This specification states https://w3c.github.io/mediacapture-main/#idl-def-MediaDeviceKind.audiooutput

> **audiooutput** | Represents an audio output device; for example a pair of headphones.

Audio Output Devices API https://w3c.github.io/mediacapture-output/ does not actually use or define the term `"audiooutput"`. 

There is a section of Audio Output Devices API that addresses `getUserMedia()` relations to that specification

> [4.2 Obtaining Consent](https://w3c.github.io/mediacapture-output/#privacy-obtaining-consent)
> The user agent may explicitly obtain user consent to play audio out of non-default output devices using [`selectAudioOutput`](https://w3c.github.io/mediacapture-output/#dom-mediadevices-selectaudiooutput).
> 
> Implementations MUST also support implicit consent via the [`getUserMedia()`](https://www.w3.org/TR/mediacapture-streams/#dom-mediadevices-getusermedia) permission prompt; when an audio input device is permitted and opened via [`getUserMedia()`](https://www.w3.org/TR/mediacapture-streams/#dom-mediadevices-getusermedia), this also permits access to any associated audio output devices (i.e., those with the same groupId). This conveniently handles the common case of wanting to route both input and output audio through a headset or speakerphone device.

the remainder of comment from the above-linked Chromium bug 

> selectAudioOutput() is to be used in combination with setSinkId() and it has nothing to do with audio capture or microphone inputs; only playback (on media elements). It's a way to get access to a deviceId of kind "audiooutput" without using enumerateDevices. This deviceId is useful only with setSinkId since it's of kind "audiooutput".
> 
> Wrt the getUserMedia() UI in Chromium, its purpose is to request permission and not to select device. The device is selected based on the constraints passed to getUserMedia().  This might change in the future, following some spec changes, but for now it is totally spec compliant. Even if the prompt allowed for device selection, it wouldn't support capturing from monitor devices since those devices are not supported by Chromium. It would only list the devices reported as audioinput by enumerateDevices.

**Problems**

The first problem is that it that Chromium implementation of `MediaStreamTrack` set the label a microphone device to `"audiooutput"`. Firefox labels monitor devices, which Chromium refuses to capture or list at `enumerateDevices()` as `"audioinput"`. 

The second problem is the device with the `kind` `"audiooutput"` is actually not headphones or speakers at all, where this specification does not explicitly define capture of audio output to headphones or speakers.

The third problem, which is the consequence of first and second problems, it is reasonable for users in the field attempting to achieve capture of actual audio output (a reasonable interpretation of headphones or speakers) to expect a `kind` denoted as `"audiooutput"` to be the plain meaning of that term, as eluded to in this specification. See [This is again recording from microphone, not from audiooutput device #14](https://github.com/guest271314/SpeechSynthesisRecorder/issues/14)

> Since this was not working on latest chrome 71, I downgraded to chrome 60. I see that this program is recording from microphone instead from speechSynthesis.speak(). I feel the reason is because both audioinput and audiooutput have same deviceId="default". So how can I make it record from speak() ?

illustrating the potential for and reality of confusion where a device kind from `enumerateDevices()` is filtered for `"audiooutput"` with the expectation of capturing audio output to headphones or speakers https://github.com/guest271314/SpeechSynthesisRecorder/blob/master/SpeechSynthesisRecorder.js#L63

```
return navigator.mediaDevices.getUserMedia({
        audio: true
      })
      // set `getUserMedia()` constraints to "auidooutput", where avaialable
      // see https://bugzilla.mozilla.org/show_bug.cgi?id=934425, https://stackoverflow.com/q/33761770
      .then(stream => navigator.mediaDevices.enumerateDevices()
        .then(devices => {
          const audiooutput = devices.find(device => device.kind == "audiooutput");
          stream.getTracks().forEach(track => track.stop())
          if (audiooutput) {
            const constraints = {
              deviceId: {
                exact: audiooutput.deviceId
              }
            };
            return navigator.mediaDevices.getUserMedia({
              audio: constraints
            });
          }
          return navigator.mediaDevices.getUserMedia({
            audio: true
          });
        }))
```

where if nothing changes at Chromium implementation the device `kind` will be `"audiooutput"` yet headphones or speakers output will never be captured, only microphone will ever be captured.

Thus, why the `kind` `"audiooutput"` at all where both `"audioinput"` and `"audiooutput"` refer to the exact same device?

If the current language _is_ clear to an author of this specification, kindly explain to the users above and below exactly why `"audiooutput"` really does not mean _capture_ of audio output to speakers or headphones at all, and really just means the same as `"audioinput"`, a microphone, an input device; to avoid any further confusion as to why the code that selects `"audiooutput"` device is working as intended - `"audiooutput"` and `"audioinput"` are intended to refer to the exact same device - and never to the headphones described in the specification: abandon all hope of capturing actual headphones or speakers per this specification. 

Am relatively certain the confusion is not imagined and can be eliminated.

Comments to initial proof-cof-concept of capturing `speechSynthesis.speak()` output https://stackoverflow.com/a/45003549

> 2 Hi @guest271314, isn't this recording the user's mic - and not the actual synthesized speech? Is that what you intended? – Ronen Rabinovici Dec 1 '17 at 11:26 
> 
> Thanks for this great example. I'm not sure if it is currently working in the latest Chrome (non beta). I have forked here to try it. I can see the audio player, but with no audio file in: jsfiddle.net/k1q07rsy – loretoparisi Dec 7 '17 at 9:36
> 
> @RonenRabinovici Yes, the original code at answer did record the device microphone. The original code is a workaround for the requirement to record speech synthesis by default at modern browsers. Updated code to set "audioouput" as device to record github.com/guest271314/SpeechSynthesisRecorder/commit/… – guest271314 Jan 10 '18 at 3:18 
> 
> 2 @loretoparisi See updated code which sets media device to record to "audiooutput" plnkr.co/edit/PmpCSJ9GtVCXDhnOqn3D?p=preview – guest271314 Jan 10 '18 at 3:22
> 
> 2 @guest271314, I used the code at plnkr.co/edit/PmpCSJ9GtVCXDhnOqn3D?p=preview but it still recorded from my microphone. – Jeff Baker Aug 15 '18 at 22:54
> 
> This doesn't record speaker output. I tried capturing tab audio using chrome extension but still failed. It seems speechSynthesis is not using HTMLmediaElement for audio hence we shall not be able to capture at tab/browser level. The audiooutput mentioned above returns "default " for both mic and speaker since there is no way to set "kind" field while setting constraints in getUsermedia, it always captures "mic". Let me know in case more details required. – Gaurav Srivastava Mar 4 '19 at 1:13 
> 
> Confirming that it records from microphone rather than speech synthesis - at least in Chrome 84. – joe Aug 13 at 11:15

precisely how `"audioinput"` and `"audiooutput"` are intended by this specification and derivatives to only refer to the same device, a microphone.

Such an explanation would require defying logic given that there is absolutely no difference between the device with `kind` set to `"audioinput"` and device set to `"audiooutput"` at Chromium browser. That only serves to create and maintain confusion, which is completely avoidable by clearly stating that this specification does not capture actual audio output to heaphones or speakers whatsoever; then users can know that fact for certain and not expect such behaviour at all from either this or derivative specifications

**Solutions**

Implementers must not use the term `"audiooutput"` set at `kind` of `MediaStreamTrack` where the captured stream is actually microphone input. `"audioinput"` must be used as `kind` for microphone input devices.

Do not set `"audiooutput"` on devices at `enumerateDevices()`.

Make it clear in this specification does not specify capture of audio being output at headphones or speakers. This necessarily means that `"audiooutput"` `kind` cannot be true and correct as the specification does not currently define that procedure at all (the original intent of this specification was evidently limited to microphone input capture, not headphones or speaker capture).



Please view or discuss this issue at https://github.com/w3c/mediacapture-main/issues/720 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 7 September 2020 18:13:05 UTC