W3C home > Mailing lists > Public > public-webrtc-logs@w3.org > January 2020

Re: [mediacapture-main] Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability to capture of audio output device - not exclusively microphone input device (#650)

From: guest271314 via GitHub <sysbot+gh@w3.org>
Date: Wed, 01 Jan 2020 20:05:50 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-570079777-1577909148-sysbot+gh@w3.org>
@jan-ivar 

Happy New Year!

Created several workarounds, or what you might refer to as "a hack".

It turns out that at Ubuntu is shipped `localhost` on by default to test the Apache server therefore all that is necessary to use that test server is to save a script in the `/var/www/html/` directory, e.g., `index.php`

```
<?php 
  if(isset($_POST["text_or_ssml"])) {
    header("Content-Type: audio/ogg");
    $options = $_POST["options"];
    echo shell_exec("ESPEAK_DATA_PATH=/home/user/espeak-ng LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH} /home/user/espeak-ng/src/espeak-ng -m --stdout " . $options . " '" . $_POST["text_or_ssml"] . "' | ffmpeg -i - -f opus -");
  };
```

where with the appropriate flags set or including CORS header `localhost` can be requested from any origin.

For a more elaborate solution that would up being a proof-of-concept for https://github.com/whatwg/html/issues/3443 and https://github.com/WICG/native-file-system/issues/97 created a pattern that provides a means to execute local arbitrary shell scripts and set the `wav` file to be used for `--use-file-for-fake-audio-capture` from the browser. 

While testing the code it became obvious that there is no way to determine precisely when audio output of speech synthesis actually ends when the output mechanism is a `MediaStreamTrack` - at least not when using the approach of setting the local `wav` file to be played, as Chromium does not fire `ended`, `mute`, or `unmute` events for the `MediaStreamTrack` and the input, since we are potentially expecting SSML input text, can include 

`<break time="5000ms">`

where if we test for silence https://stackoverflow.com/a/46781986 in order to determine when the expected audio output is complete we could prematurely call `stop()` during an intended `<break time="5000ms">`, and since a `MediaStream` is infinite there is no default end to the `MediaStreamTrack` in this case. However, Support SpeechSynthesis *to* a MediaStreamTrack (https://github.com/WICG/speech-api/issues/69) was the requirement, thus leave it to the OP of that requirement to find that out for themselves. 

It also turns out that Chrome OS is already using `espeak-ng` and `AudioWorklet` to output the result (https://chromium.googlesource.com/chromiumos/third_party/espeak-ng/+/refs/heads/chrome). Still, `-m` flag does not appear to be set, so SSML parsing (which, from perspective here, alleviates the need to define speech synthesis events, etc.) is not possible using that code.

In any event, your closure of this issue/feature request ironically lead to revisiting prior interest in executing arbitrary shell commands using the browser as a medium https://gist.github.com/guest271314/59406ad47a622d19b26f8a8c1e1bdfd5.


-- 
GitHub Notification of comment by guest271314
Please view or discuss this issue at https://github.com/w3c/mediacapture-main/issues/650#issuecomment-570079777 using your GitHub account
Received on Wednesday, 1 January 2020 20:05:51 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:22:36 UTC