Re: [mediacapture-main] Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability to capture of audio output device - not exclusively microphone input device (#650) from guest271314 via GitHub on 2019-12-23 (public-webrtc-logs@w3.org from December 2019)

From: guest271314 via GitHub <sysbot+gh@w3.org>
Date: Mon, 23 Dec 2019 07:34:06 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-568386696-1577086445-sysbot+gh@w3.org>

@jan-ivar

> If the desire is to get at the output of speechSynthesis, please take that up with the working group responsible for speechSynthesis directly.

The use case is not limited to capturing `speechSynthesis.speak()` output. The source (input device) can be any audio input output by the system.

Currently the Web Speech API does not have any algorithm language. A socket connection is made by the client browser to `speech-dispatcher` which executes `festival`, `flite`, `espeak`, `espeak-ng` or other speech synthesis module. No speech synthesis occurs in the browser. The executable (speech synthesis module) must be installed on the system for synthesis to occur. Besides bringing the technology to the fore within the domain of Web platform that is essentially the sum of the Web Speech API at the present state. There is no option to pass a file, capture the media output, or write to a file. Perhaps progress will be made there.

> Solving that here would be a hack in my view.

The change being asked for is to merely make it clear that device listed as monitor of default audio device MAY be exposed by implementations, to at least recognize that option is available, even if implementers decide to not expose the monitor device.

Since there is no hope for language to be specified that we can capture monitor of input audio device source directly at constraints passed to `getUserMedia()` what we are left with in the field is to try to create one or more hack. Will eventually find a way to pipe the output of

`ffmpeg -f pulse -i alsa_output.pci-0000_00_1b.0.analog-stereo.monitor`

and

`espeak-ng --stdout -d 0 'speak' | ffmpeg -i - -f opus -`

to a `MediaStreamTrack` using JavaScript instead of exclusively piping output to a file first

` | tee $HOME/test.ogg | chromium-browser --user-data-dir=$HOME/test $HOME/test.ogg`

then fetching the file.

Will dive into https://github.com/pettarin/espeakng.js-cdn/blob/master/js/demo.js to substitute `AudioWorklet` for `createScriptProcessor` where it should be possible to "hack" a `MediaStreamTrack` for output (after loading 2MB of data https://github.com/pettarin/espeakng.js-cdn/blob/master/js/espeakng.worker.data, though that is less than 279.75 MiB of data https://webrtc.googlesource.com/src just to create a `MediaStreamTrack` of monitor of audio input exposed to the browser). Not ideal. Though that is the state of the art.

--
GitHub Notification of comment by guest271314
Please view or discuss this issue at https://github.com/w3c/mediacapture-main/issues/650#issuecomment-568386696 using your GitHub account

Received on Monday, 23 December 2019 07:34:08 UTC