Re: Add "MediaStream with worker" for video processing into the new working items of WebRTC WG

Hi, Mathieu,
A quickly reply for the use case seem under-specified currently. That will
be supported by other stuff, OfflineMediaContext[1], going to address in
the next step. Basically this specification one of the puzzle in a project
called FoxEye[1]. You might check it out first.

[1]: https://wiki.mozilla.org/Project_FoxEye#OfflineMediaContext:

BR,
CTai

2015-07-30 12:27 GMT+08:00 Mathieu Hofman <Mathieu.Hofman@citrix.com>:

>  I might have missed the "video processor" use case of the application
> wanting to receive every single frame from the source without skipping any.
> But this use case seem under-specified currently. In your proposal, the
> decision of skipping frame seem to be left to the implementation:
>
>  "Ideally the MediaStreamTrack should dispatch each video frame through
> VideoProcessorEvent
> <http://chiahungtai.github.io/mediacapture-worker/#idl-def-VideoProcessorEvent>.
> But sometimes the worker thread could not process the frame in time. So the
> implementation could skip the frame to avoid high memory footprint. In such
> case, we might not be able to process every frame in a real time
> MediaStream."
>
>  If you want to support apps that are guaranteed delivery of every single
> frame, you need make sure the processor implementation queues a frame
> event for every new frame generated. But with a pure push mechanism, that
> creates issues for apps that would like to skip frames. The app's js worker
> would need to somehow be able to drain the queue of frame events and
> process the last one. Without the worker having any knowledge if there is
> any frames left in the queue, the skipping can become pretty convoluted
> (one solution would be setTimeout(callback, 0); save "latest" frame in each
> frame event handler invocation; start processing of "latest" frame in the
> setTimeout callback).
> This complexity is another reason why a built-in back-pressure channel is
> beneficial. A solution here would be to add the ability to pause/resume the
> processor's generation of frame events. An app than want to skip frames
> would pause the processor when starting to process a frame, and resume it
> when done.
>
>  The complexity would be reversed with an async pull mechanism. The frame
> skipping is obvious in this case:
> function processNext () {
>   return processor.requestFrame().then(processFrame).then(processNext);
>  }
> processNext();
> In this case, the worker only gets a new frame when it's done with the
> previous one. If the requestFrame() function of the processor is
> specified to only deliver frames that haven't been delivered by this
> processor object before, you would always get a "new" frame. If the source
> is real-time, we would most likely want the processor to internally cache
> the "latest" frame and keep a flag to know if it has been delivered yet or
> not.
> Receiving all frames of a realtime source would be more convoluted with a
> pure async pull mechanism:
> var queue = [];
> var processing = false;
>  function queueFrame(frameEvent) {
>   if (frameEvent) queue.push(frameEvent);
>    processor.requestFrame().then(queueFrame);
>   if (!processing) processNext();
> }
> function processNext() {
>   if (processing || queue.length == 0) return;
>   processing = true;
>   processFrame(queue.unshift()).then(function() {
>     processing = false;
>     processNext();
>   });
> }
> queueFrame();
>
>  As I said, convoluted, but if the processFrame function is asynchronous
> and yields back to the event loop frequently enough, this code should get
> and queue every frame generated by the real-time source.
> To solve this complexity maybe the processor could be constructed with a
> "caching strategy", telling it to either cache all frames of real time
> sources, or skip unneeded frames.
>
>  Now from what I understand an API based on a ReadableStream [1] might
> not solve all these use cases either:
> Screen sharing as a pull source or push-source with back pressure support
> would be trivial to support.
> A push source with no back-pressure support (real-time webcam) could
> simply enqueue frames in the readable stream. If the app wants to be able
> to consume every single frame, there is no problem. If the app wants to
> skip frames, draining the stream until the "latest" frame might be an issue
> since the app has no way to know if a read from the stream will pull from
> the queue or wait for the source.
> An alternative would be to require the app to implement a writable stream
> sink and create a writable stream passed to processor?
>
>  To sum up, I agree that none of the originally suggested solutions seem
> solve all the use cases.
> I think there are 3 approaches from here:
> - Open issues against the Streams spec to improve support for "skippable
> streams" and use an API based on Streams
> - use an async pull mechanism and add a "skipFrame" boolean option to the
> processor's constructor
> - use a push mechanism and add pause()/resume() operations to the
> processor for back-pressure support
>
>  Did I miss anything?
>
>  Mathieu
>
>  [1] https://streams.spec.whatwg.org/#rs
>
>  ------------------------------
> *From:* Chia-Hung Tai [ctai@mozilla.com]
> *Sent:* Wednesday, July 29, 2015 6:45 PM
> *To:* Mathieu Hofman
> *Cc:* robert@ocallahan.org; public-webrtc@w3.org;
> public-media-capture@w3.org
> *Subject:* Re: Add "MediaStream with worker" for video processing into
> the new working items of WebRTC WG
>
>   Hi, Mahieu,
> I would like to support all use cases if I could. But the problem is I
> can't image how to design an async pull APIs and an elegant sample codes to
> guarantee process every frame? That will be great if you can show me some
> concrete sample codes. I already tried to figure it out for a while.
> Guarantee every frame is the most important reason why we choose push
> mechanism. The key use case is video editing.
> I think at least below use cases we need to take care.
> 1. Real time camera processing for webrtc or camera recording => video
> processor case
>  2. Real time video analysis case. In this case, we only analysis frame
> and don't modify the stream => video monitor case.
> 3. Video editing case. We need to guarantee frame by frame processing. =>
> video processor case
>  4. Screen sharing case. I think that is what you want. But I am not sure
> what exactly what it would look like.
>
>  I am not sure how to provide a solution for all those cases by async
> pull. Would happy to learn from you.
>
>  BR,
> CTai
>
>

Received on Thursday, 30 July 2015 05:12:52 UTC