[mediacapture-main] Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability to capture of audio output device - not exclusively microphone input device (#650) from guest271314 via GitHub on 2019-12-08 (public-webrtc-logs@w3.org from December 2019)

From: guest271314 via GitHub <sysbot+gh@w3.org>
Date: Sun, 08 Dec 2019 17:38:59 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issues.opened-534574931-1575826738-sysbot+gh@w3.org>
guest271314 has just created a new issue for https://github.com/w3c/mediacapture-main:

== Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability to capture of audio output device - not exclusively microphone input device ==
https://github.com/w3c/mediacapture-main/pull/211 added output device capability to `enumerateDevices()` while concerns were raised about the definition output device, or the omission thereof, in the specification, e.g.,

https://github.com/w3c/mediacapture-main/pull/211#issuecomment-130468068
> It seems to me we've forgotten to define output devices. Relying on their similarity to input devices, is the weak link in this reasoning imho.

Currently the term `"audiooutput"` occurs twice in the specification, where the language appears to be a brief description of the term, not explicitly a _definition_ of the term


> `MediaDeviceKind` Enumeration description
> `audiooutput` | Represents an audio output device; for example a pair of headphones.

A pair of headphones could not reasonably be construed as a microphone. 

However, in spite of `"audiooutput"` and the brief description appearing the text of the specification, at least one implementer has interpretated the language to not explicitly mean capture of audio output is mandated by the current specification https://bugs.chromium.org/p/chromium/issues/detail?id=1013881#c9

> The getUserMedia() spec does not mandate capturing audio output or showing a UI prompt as part of the device selection procedure. 

At least one concrete use case where the definition of `"audiooutput"`, the devices list from `enumerateDevices()`, and whether or not the specification mandates capture of audio output devices, where clarity or lack thereof is observable, consider the code

```
    (async() => {
      navigator.mediaDevices.ondevicechange = e => console.log(e);
      const stream = await navigator.mediaDevices.getUserMedia({
        audio: {
          deviceId: {
            exact: await navigator.mediaDevices.enumerateDevices()
                   .then(devices =>
                     devices.find(({
                       kind, label, groupId
                     }) => label === "Monitor of Built-in Audio Analog Stereo" // Firefox
                             || kind === "audiooutput" && groupId !== "default" // Chromium
                     ))
                     .deviceId
            }
          }
        });
      const [audioTrack] = stream.getAudioTracks();
      audioTrack.onmute = audioTrack.onended = e => console.log(e);
      const text = [...Array(10).keys()].join(" ");
      const handleVoicesChanged = async e => {
        const voice = speechSynthesis.getVoices().find(({
          name
        }) => name.includes("English"));
        const utterance = new SpeechSynthesisUtterance(text);
        utterance.voice = voice;
        utterance.pitch = 0.33;
        utterance.rate = 0.1;
        const recorder = new MediaRecorder(stream);
        recorder.start();
        speechSynthesis.speak(utterance);
        recorder.ondataavailable = async({
          data
        }) => {
          (new Audio(URL.createObjectURL(data))).play();
        }
        utterance.onend = e => 
          (recorder.state === "recording" && recorder.stop()
          , audioTrack.stop());
      }
      speechSynthesis.onvoiceschanged = handleVoicesChanged;
      let voices = speechSynthesis.getVoices();
      if (voices.length) {
        handleVoicesChanged();
      };

    })().catch(console.error);
```

which is intended to select only `"audiooutput"`, not `"Microphone"`. 

Firefox 70 and Nightly 73 outputs the expected result, that is, capturing and recording only audio output, _not_ input from microphone: Meaning _only_ audio output is captured, _not_ microphone input _and_ audio output.

Chromium 80 does not output the expected result. Even where `"audiooutput"` is selected microphone is captured and recorded, not `"audiooutput"`. That is a Chromium bug that is marked `WontFix` (https://bugs.chromium.org/p/chromium/issues/detail?id=1013881) apparently due to lack of clarity in this specification relevant to the capability to select audio output - not only microphone.

Contrary to the suggestion at https://github.com/w3c/mediacapture-main/issues/629#issuecomment-545844012 `getDisplayMedia()` after testing various approaches, does not provide any means to capture audio output from the system.

Kindly make it clear in this specification that 1) capture of audio output is under the umbrella of this specification and provide an example of the canonical code pattern to achieve that use case per this specification; 2) the user can select `"Monitor of <audio_device>"` at UI prompt and directly in code by use of `applyConstraints()` and directly at `getUserMedia(<constraints>)`; or 3) this specification is _not_ intended to be construed to capture only audio output.



Please view or discuss this issue at https://github.com/w3c/mediacapture-main/issues/650 using your GitHub account
Received on Sunday, 8 December 2019 17:39:01 UTC