Minutes of the WEBRTC WG Virtual Interim (April 28, 2020) from Bernard Aboba on 2020-04-30 (public-webrtc@w3.org from April 2020)

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Date: Thu, 30 Apr 2020 14:54:41 +0000
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <BN3PR00MB0180C5907C1E5B4F4BF7BEAEECAA0@BN3PR00MB0180.namprd00.prod.outlook.com>

Minutes of the WebRTC WG April 28, 2020

Insertable Streams (Harald)

This potentially addresses the requirements arising from several WebRTC-NV use cases, including "Funny Hats", "Machine Learning" and "Virtual Reality Games". The proposed Insertable streams API provides an insertion point in the processing chain after encode/before decode. This addresses WebRTC-NV requirements N19 (insert processed frames) and N23 (data sync'd with media). Transferrable Streams addresses requirement N21 (sharing media with workers).

Raw media is the obvious next step. That would address requirements N18 (raw media from capture device) and N20 (decoded media from a receiver). In the current proposal requirement N22 (GPU support) is not covered.

An Experiment has been implemented in Chrome Canary M83. This has been tested with real applications: Duo Group, Jitsi, Medooze.

The API is simple (slide). Works with Web Workers. Performance seems to be adequate. To be proposed for adoption in May.

Discussion

Jan-Ivar: Mozilla is still forming an opinion with regards to E2E encryption - possible concerns. Could be used to synchronize data with media (N23).

Requires peer connection for local processing like “funny hats”. Does this really belong to WebRTC and not MediaStreamTrack? Is there overlap with audio worklets? What about key exchange?

Harald: Different tools for different purposes. Opening up tracks seems like a larger design effort than sender/receiver. Confident in this because it is deployable.

Maksim: What are the performance implications?

Guido: We don’t have measurements but we know that insertable streams isn’t the bottleneck. For E2E demos, the bottleneck was the encryption operations, and reducing frame copies internally.

Harald: For raw video use cases, the data involved will be up to ten times larger, so performance concerns (such as memory copies) are potentially greater there.

Youenn: Concern about exposing keys to JavaScript application. Might be fine for now, but worried this makes E2E “good enough”. Prefer the browser to own encryption. Example: we were able to deprecate DTLS because we were in control.

Harald: IETF PERC WG spent 5 years trying to define a protocol for E2E encryption and it failed to be widely deployed. This version is being deployed. If the structure is right and we get consensus on the component that is able to do the key management, then it’s possible to use this architecture to invoke such component. Waiting until we have the architecture will leave the users exposed for another 5 years.

Bernard: In web codecs, we’ve talked about having a handle to a GPU buffer, rather than an Array Buffer in memory. Also, it appears that Web codecs is not going to use streams. Do you have a comment?

Harald: See slide 8. There are feedback loops which are not ideal for Streams use in WebCodecs. For Insertable Streams that is OK because we let those signals pass the insertable step. But WebCodecs can't do that.

Youenn: What will happen if JS adds significant overhead to the video frame so as to consume more bandwidth than rate adaptation wishes to allow?

Harald: It would blow up. We might need to expose the feedback to JS in the future, it is not clear how.

Youenn: The scenarios covered by Insertable Streams include E2E and Funny hats. (Bernard: Also VR Gaming and Machine Learning). Maybe it’s good to have an API that supports multiple scenarios, or maybe multiple APIs would be better.

Maksim: Couldn’t we just add a signaling API?

Harald: For simple cases we could just add a property tell us how much metadata is being added. For more complex cases we might need to do more.

Next step is to provide a specificaiton and call for adoption.

Discussions cut short due to running over time.

Content-Hints

PR 40: speechRecognition content hint (Sam Dallstream)

mst-content-hint is geared towards communication, e.g. adds echo cancellation and automatic gain control, which hurts speech recognition.

Proposal: Add a “speechRecognition” content hint.
Pros: minimal changes to spec which would allow prototyping and gathering feedback.
Cons: not supported by all browsers.

Jan-Ivar: Content hint is descriptive of content, but this seems more to describe what the use case is. What is missing that you can’t do by simply setting constraints such as Auto-Gain Control and Echo Cancellation to "false"? What are the lower layer things?

Sam: In modern machines there are optimizations all the way down to the hardware, these cannot be specified by the web app.

Henrik: What does it do?

Sam: There are ETSI specifications that describe the audio processing that is done.

Next steps: Editors will review PR 40 and merge it or ask for text updates.

Privacy and Security

Media Capture and Streams

Issue 688: Clarify only fire devicechange event when devices physically added/removed (Jan-Ivar)

Inserting and removing devices and enumerateDevices() used to be the same list and event. This is no longer true! Can’t tell if the device is new or if permission was given to devices.

We don’t have deviceIds pre-GUM. Apps need to update cache on GUM. There could be a race. Developer can’t tell if it’s a new device.

Proposal A: Allow changing list without devicechange.

Proposal B: Add deviceinserted.

“Proposal C” if you have one device pre-gum there should be no device change event.

Discussion

Guido: Why not suppress all events prior to GUM access? Don’t fire just because you got permission.
Harald: Don’t like Proposal B.
Youenn: Maybe Proposal A? Not sure.
Bernard: Next steps?
Harald: Proposal A seems like the minimal change. Proposal B would break existing code.

Next steps: Make a PR for proposal A.

Issue 668: What happens when the machine suspends? (Harald)

What happens on page suspend?

We couldn’t find a DOM event for suspend. Editors concluded nothing happens to peer connections on suspend. But what about devices?

Closing the lid and opening it again: it could still be capturing prior to logging in!

Proposal: Fire muted on suspend, unmute when unlocking - not before! Is it specifiable?

Jan-Ivar: When you lock the screen the page doesn’t actually suspend. Can we piggyback on whether or not playout should happen?

Discussion: Maybe this should be up to the user agent but we should definitely not be creepy.

Next steps: Let’s write guidelines.

Issue 669: “user-chooses”: Do required constraints make any sense now? (Henrik)

In-chrome picker competes with in-content picker. Where are we headed?

Does it still make sense to filter out the set of devices if we have a picker? E.g. asking for HD, removing the SD camera.

Proposal: Allow deviceId to be required, but other constraints must not filter the list of advices shown in the picker.

Flavor A: Support in-content picker. B: Partial support. C: Ban deviceIds.

Discussion

Jan-Ivar: This is separate from “user-chooses”. Do we need to decide this now?

Youenn: I think this is relevant - it would be confusing to present a different list to the user depending on application constraints.

Henrik: We don’t need to solve all issues now, but when implementing “user-chooses” we need to know if the presented list should be
filtered or not. I want to get a sense of where we are headed.

Discussions around allowing user agents to start experimenting and wanting to break as little things as possible with the old way of doing things.

More discussion needed.

Received on Thursday, 30 April 2020 14:55:10 UTC