Re: [mediacapture-insertable-streams] Is using ReadableStream/WritableStream the right approach? (#4) from guidou via GitHub on 2021-02-11 (public-webrtc-logs@w3.org from February 2021)

From: guidou via GitHub <sysbot+gh@w3.org>
Date: Thu, 11 Feb 2021 13:50:20 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-777471673-1613051419-sysbot+gh@w3.org>
> > Its only disadvantage would be running on the main thread by default
> > That is one disadvantage.
> 
> ScriptProcessorNode is an evidence of this issue for audio and the plan is to obsolete it, not complement it with Audio Worklet. Why should we reintroduce this potential issue?
> 

I don't think it's exactly the same case. Could you send a ScriptProcessorNode to a worker to free up the main thread?

> > Experience in the field shows that the main thread can work fine for many use cases, and moving to workers is easy as well.
> > On certain devices, on certain configurations. The more we will see camera shooting with higher fps, the worse it might get.
> > I am not against main thread processing, but it seems this should be opt-in not the default.
> 
> There are other potential issues that might get solved (or be more acute) once the relationship between a track and its streams is made more precise. Currently the spec is light there so it is difficult to assess it:
> 
> * For instance, it is not clear how backpressure applies in general.
>   For audio this might be a no-go, loosing a frame will cause audio chirps. For camera, this might be beneficial in cases the processing pipe is too slow to process. That might help the case of a camera that is using a buffer pool and if each buffer in the buffer pool is in the pipe, capture will stop.
>   How do you anticipate MediaStreamTrackProcessor use the backpressure mechanism? Should MediaStreamTrackProcessor be able to define its backpressure mechanism in terms of number of frames that are queued?

Yes, the processor is able to specify the number of buffered frames (we recently updated the spec to reflect this).

> * Similarly, how does MediaStreamTrack cloning refer to ReadableStream cloning? Is it guaranteed to get the same content between the two approaches?

Cloning a track in general gives you another track backed by the same source. Cloning a MediaStreamTrackGenerator (which is a track) returns a regular MediaStreamTrack that will carry the same frames written to the MediaStreamTrackGenerator via its writable.
Cloning the readable of a MediaStreamTrackProcessor using the tee() method returns two streams that get the same frames (or, more specifically, VideoFrame objects backed by the same underlying frame).
Does that answer the question?


> 
> Also, this somehow creates two APIs doing roughly the same thing: MediaStreamTrack and ReadableStream/WritableStream pair. API-wise, we would prefer sticking with MediaStreamTrack producers and consumers. Or do we anticipate other APIs to get raw content directly from streams and replace MediaStreamTrack progressively?

I don't anticipate replacing MediaStreamTracks with streams, since they do not even function as tracks. Streams allow applications to define custom MediaStreamTrack producers and consumers (i.e., sources and sinks). These custom sources and sinks defined in JS are not intended to replace platform sources and sinks, let alone tracks.

> Ditto for MediaStreamTrackProcessor.writableControl which seems potentially somehow redundant with existing MediaStreamTrack APIs like enabled/disabled/applyConstraints. It would be good to clarify this, at least for me.
> 

The purpose of writableControl and readableControl is to expose a control plane that already exists behind the scenes between sinks and sources. The idea is to allow control feedback to flow between custom and platform sinks and sources. They are neither a replacement nor an alternative to enabled/disabled/applyConstraints. There might be some overlap with applyConstraints, but the concept is different. For example, a peer connection platform implementation cannot disable a track, but it can (and in some practical implementations does) sometimes ask a source for a new frame. 
API wise, readableControl and writableControl could be replaced with events and methods. We are proposing the stream approach partly for consistency with the media plane, and partly because they are easy to transfer to a worker. 


> Talking of the signal control API, I am not clear about what happens if the processor does not process the signals or is not prepared to understand some signals (given it is extensible). I could think of some transforms just forgetting about sending the signals up in the chain.

Nothing should happen if a source (custom or platform) does not understand a signal, or if a sink (custom or platform) does not send a signal. They are intended to be hints to which sources can (but are not required to) react. If a signal does not make sense for a particular source or sink, they do not need to use it. For example, today some platform sinks can request a new frame from a platform source. Some sources are designed to produce a new frame when that signal is received and other sources just ignore the signal. This happens behind the scenes using a regular MediaStreamTrack with platform sources and sinks, independent from the API proposed here.
We consider it necessary to expose some of these signals to JS since we are allowing the creation of custom sources and sinks in JS.

> So they could potentially pile in the stream, or would there be some backpressure as well to help the caller understand whaft is happening?

The spec currently does not describe what to do in this case, which means it's up to the implementation. I agree that the spec should be clearer here. Signals are just hints, so we should not provide delivery guarantees, but we could allow the application to specify buffer sizes as is the case for media.

> Or they could be lost in the middle of the chain. I am not sure how feasible it is, but for those kind of signals, it might be good to have a default behavior and allow the application to override it.
>
An application producing frames using a generator can interpret the signals it receives via readableControl in any way it wants, so overriding is the only possible way to proceed there. Platform sources in practice already handle signals, either by ignoring them or by acting on them if it makes sense. For example, a camera capturer could ignore signals for new frames, while a screen capturer might produce a new frame if requested.
 
> Maybe I should file separate issues but I want to ensure we pick the right design before we spend time on editing work.
> Maybe we should do the exercise of comparing this model with other envisioned models like VideoTrackReader and the benefits will get clearer.

The WebCodecs group compared VideoTrackReader with MediaStreamTrackProcessor and it was decided to remove VideoTrackReader from the WebCodecs spec (it's gone already). VideoTrackReader was basically the same as MediaStreamTrackProcessor, but always running on the main thread and without control signals. Processing in workers required the application to transfer the VideoFrames to the worker.


-- 
GitHub Notification of comment by guidou
Please view or discuss this issue at https://github.com/w3c/mediacapture-insertable-streams/issues/4#issuecomment-777471673 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Thursday, 11 February 2021 13:50:22 UTC