Minutes of the March 25, 2021 WEBRTC WG Meeting

Minutes of the WEBRTC WG Virtual Interim
March 25, 2021

Scribe: Youenn Fablet

1. Testing

Test Proposal from Last Meeting
Summarizing the current proposal: improve test coverage with "echo server" based on aiortc.
The aiortc server has been rewritten using ORTC so it can now handle lower level testing of STUN or DTLS if needed.
3 tests that have been written -> Proof of concept that it is working and helpful
What do we need to do to get this into WPT?
Next steps:
AI: Youenn to look at what is needed and probably file an RFC request (https://github.com/web-platform-tests/rfcs)
AI: Fippo to submit a PR for the server in WPT.

2. Insertable Streams - A use case study

In this use case, the goal is to be able to replace NetEQ (e.g. jitter buffer, concealment, etc. for realtime audio) in WASM.
To do it, it is desirable for the application to be able to get access to output of the RTP depacketizer (on the WebRTC receiver).
The raw decoded data would then be processed in WASM.
Browser playout is handled via an audio track hooked to an audio element.

Requirements:
Inject JS before decoder
Inject JS after decoder
API to disable some algorithms done by receiver (jitter buffer)
Harald: JS before and after decoder needs coupling, ideally in the same worker.
Tim Panton: Worry about decoupling of decoder with other parts of the pipeline.
Some effects might want to require decoupling.
Jan-Ivar: Use case is a bit narrow as they are not happy with NetEQ
They could encode/decode themselves, use data channel and do their own NetEQ
Usually they control both endpoints
Good to understand how we want to make evolve the pipeline so that we have the webrtc encoded API designed correctly
Tim Panton: How about doing this in audio worklet if it can get the audio packets
Harald: We want to do that with encoded data
Henrik: if you want that for audio, you might want that for video as well
Tim Panton: Need to have influence on the decoder.

3. API Relationships

Bernard: WebRTC WG handled the development of the WebRTC 1.0 API within a single W3C Working Group. What we now call "WebRTC Next Version" is
actually multiple APIs (e.g. Insertable Streams (WebRTC and MediaStreamTrack), WebCodecs, WebTransport, WebNN, WebGPU) that are being developed in different
W3C WGs (WEBRTC, Media, WebTransport, etc.) Since there are multiple WGs involved (and potentially very few people who are in all of them), there is the
possibility that the APIs will have "seams" and won't work together optimally.

Here are diagrams that show the API relationships for sending and receiving. When sending, we obtain a MediaStreamTrack through any of the existing
mechanisms. The MediaStreamTrack Insertable Streams MSTProcessor interface converts this to a stream of raw videoFrames. An API such as WebNN or WASM SIMD then
operates on the raw videoFrames to do "funny hats" or "machine learning", then the processed videoFrames are passed to WebCodecs encoder
which produces encodedVideoChunks. These are may be processed in WASM (e.g. for packetization) and then are provided to a transport, which could be
WebTransport (for a client/server scenario), WebRTC data channel (for P2P or client/server scenarios), or maybe even RTP (if this is exposed).

When receiving, the network transport provides encodedVideoChunks to the WebCodecs decoder, which outputs videoFrames. These could also be processed
for machine learning, and the processed videoFrames are converted to a MediaStreamTrack by the MSTGenerator interface in MST Insertable Streams, and
they are rendered.

This raises some questions:
a. Which of the use cases represented on the diagrams are we going to support? Are there use cases we care about that aren't articulated in the
WebRTC-NV Use Cases document?
b. Are each of the APIs prepared to efficiiently process the output of the previous step in the pipeline?
For example, if WebCodecs operates on GPU buffers and produces another GPU buffer, can WebTransport read or write into a GPU buffer so as to avoid copies?
c. Do additional copies result when frames move from one API to another? Do we understand when this will happen?  Is it documented?
d. Is there something we need (such as "read only" buffers) that could reduce copies?

Youenn: Performance is often very operating system specific. The same operation could have very different performance on a different platform,
or even a different device on the same platform, using the same browser. This makes it hard to document. For example, Safari could perform
differently on different iPads.
Henrik: IOSurface is very specific. This might be very OS specific
Youenn: Agree. ARM MacOS devices are different from Intel MacOS devices. Worried that we would expose API that would expose such
differences in either OS or UA implementation.
Henrik: In ChromeOS, you need to select specific pixel formats to have good efficiency.
Bernard: Agree that performance is very hardware and platform specific today. But it is not reasonable for that complexity to be exposed to the
developer, because it would force them to create their own hardware abstraction layer. This is the kind of problem that WebNN was created to solve -
to not force the developer to utilize WebGPU to discover the hardware and optiize for it (which is a privacy problem), but rather for the browser
to take care of things optimally.
Jan-Ivar: good to have use cases, but API shape is difficult to agree on just yet given the difficulty that arises.

4. getTabContext
Jan-Ivar: Good news!
Security part: agreement to have site isolation for this API (COOP+COEP) or COEP
Might only need COEP right now.
Might be simpler with both
Opt-in for documents to be captured
Discussion on capturing or not in case of COOP+COPE failures

5. Issue 777
Agreement for proposal.
Provide guidelines and mention mobile as well.

6. Issue 64
Consensus to proceed with PR

7.Issue 16
Interest from Chrome, Firefox and Safari

Received on Saturday, 27 March 2021 00:55:04 UTC