RE: Add "MediaStream with worker" for video processing into the new working items of WebRTC WG

I might have missed the "video processor" use case of the application wanting to receive every single frame from the source without skipping any.
But this use case seem under-specified currently. In your proposal, the decision of skipping frame seem to be left to the implementation:

"Ideally the MediaStreamTrack should dispatch each video frame through VideoProcessorEvent<http://chiahungtai.github.io/mediacapture-worker/#idl-def-VideoProcessorEvent>. But sometimes the worker thread could not process the frame in time. So the implementation could skip the frame to avoid high memory footprint. In such case, we might not be able to process every frame in a real time MediaStream."

If you want to support apps that are guaranteed delivery of every single frame, you need make sure the processor implementation queues a frame event for every new frame generated. But with a pure push mechanism, that creates issues for apps that would like to skip frames. The app's js worker would need to somehow be able to drain the queue of frame events and process the last one. Without the worker having any knowledge if there is any frames left in the queue, the skipping can become pretty convoluted (one solution would be setTimeout(callback, 0); save "latest" frame in each frame event handler invocation; start processing of "latest" frame in the setTimeout callback).
This complexity is another reason why a built-in back-pressure channel is beneficial. A solution here would be to add the ability to pause/resume the processor's generation of frame events. An app than want to skip frames would pause the processor when starting to process a frame, and resume it when done.

The complexity would be reversed with an async pull mechanism. The frame skipping is obvious in this case:
function processNext () {
  return processor.requestFrame().then(processFrame).then(processNext);
}
processNext();
In this case, the worker only gets a new frame when it's done with the previous one. If the requestFrame() function of the processor is specified to only deliver frames that haven't been delivered by this processor object before, you would always get a "new" frame. If the source is real-time, we would most likely want the processor to internally cache the "latest" frame and keep a flag to know if it has been delivered yet or not.
Receiving all frames of a realtime source would be more convoluted with a pure async pull mechanism:
var queue = [];
var processing = false;
function queueFrame(frameEvent) {
  if (frameEvent) queue.push(frameEvent);
  processor.requestFrame().then(queueFrame);
  if (!processing) processNext();
}
function processNext() {
  if (processing || queue.length == 0) return;
  processing = true;
  processFrame(queue.unshift()).then(function() {
    processing = false;
    processNext();
  });
}
queueFrame();

As I said, convoluted, but if the processFrame function is asynchronous and yields back to the event loop frequently enough, this code should get and queue every frame generated by the real-time source.
To solve this complexity maybe the processor could be constructed with a "caching strategy", telling it to either cache all frames of real time sources, or skip unneeded frames.

Now from what I understand an API based on a ReadableStream [1] might not solve all these use cases either:
Screen sharing as a pull source or push-source with back pressure support would be trivial to support.
A push source with no back-pressure support (real-time webcam) could simply enqueue frames in the readable stream. If the app wants to be able to consume every single frame, there is no problem. If the app wants to skip frames, draining the stream until the "latest" frame might be an issue since the app has no way to know if a read from the stream will pull from the queue or wait for the source.
An alternative would be to require the app to implement a writable stream sink and create a writable stream passed to processor?

To sum up, I agree that none of the originally suggested solutions seem solve all the use cases.
I think there are 3 approaches from here:
- Open issues against the Streams spec to improve support for "skippable streams" and use an API based on Streams
- use an async pull mechanism and add a "skipFrame" boolean option to the processor's constructor
- use a push mechanism and add pause()/resume() operations to the processor for back-pressure support

Did I miss anything?

Mathieu

[1] https://streams.spec.whatwg.org/#rs

________________________________
From: Chia-Hung Tai [ctai@mozilla.com]
Sent: Wednesday, July 29, 2015 6:45 PM
To: Mathieu Hofman
Cc: robert@ocallahan.org; public-webrtc@w3.org; public-media-capture@w3.org
Subject: Re: Add "MediaStream with worker" for video processing into the new working items of WebRTC WG

Hi, Mahieu,
I would like to support all use cases if I could. But the problem is I can't image how to design an async pull APIs and an elegant sample codes to guarantee process every frame? That will be great if you can show me some concrete sample codes. I already tried to figure it out for a while.
Guarantee every frame is the most important reason why we choose push mechanism. The key use case is video editing.
I think at least below use cases we need to take care.
1. Real time camera processing for webrtc or camera recording => video processor case
2. Real time video analysis case. In this case, we only analysis frame and don't modify the stream => video monitor case.
3. Video editing case. We need to guarantee frame by frame processing. => video processor case
4. Screen sharing case. I think that is what you want. But I am not sure what exactly what it would look like.

I am not sure how to provide a solution for all those cases by async pull. Would happy to learn from you.

BR,
CTai

Received on Thursday, 30 July 2015 04:27:38 UTC