The Image Stream Processing pipeline/s from Rob Manson on 2013-09-05 (public-media-capture@w3.org from September 2013)

From: Rob Manson <roBman@mob-labs.com>
Date: Thu, 05 Sep 2013 12:50:38 +1000
To: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <5227F17E.6090901@mob-labs.com>
Hi all,

I have a question about the relationship between the MediaStream Image 
Capture API[1] and the MediaStream Recording API[2][3]...so I guess this 
is to Travis, Jim and Giri but all feedback welcome 8)

More specifically I'm interested in the two different types of Image 
Stream Processing pipelines these two different APIs create - this is 
related to "3.3 Find the ball assignment" in the MediaStream Capture 
Scenarios doc[4]...(but more scenarios related to this are coming from 
the Augmented Web CG soon too).

 From tracking the evolution of the Image Capture and Recording APIs I 
think you could fairly paraphrase to say that in the context of Image 
Stream Processing pipelines:

A - The MediaStream Image Capture API provides a single shot getFrame() 
method that is designed to be used within a setTimeout() or 
requestAnimationFrame() style event loop and it returns an ImageData 
object.
NOTE: See the feedback from the WG members on the list about this event 
loop decision[5].

B - The MediaStream Recording API provides a way to setup an event based 
callback handler that is called whenever the "ondataavailable" event 
fires and it returns a Blob object.  The size of the Blob can be 
controlled so only one or a few frames could be extracted in a timeslice.
NOTE: Also notice the secondary Blob->Typed Array pipeline that's needed 
in this case as described in point 1 here[6].

   "Blob is returned to the ondataavailable callback.  ArrayBuffer is
    created from Blob using FileReader.  Typed array A is created from
    ArrayBuffer."


So my question is, for web apps that want to do compute intensive Image 
Stream Processing (e.g. feature detection, object recognition, gesture 
recognition, general computer vision, etc.) what is the recommended 
pipeline approach - A or B?

If it's A then there are a few other questions that come up. e.g.
Is this "event loop" model really the most efficient approach?  And how 
do we deal with timestamps across audio and image streams within video, 
etc. so we can deliver real synchronisation[7].  Plus a range of related 
questions I've already raised - See the NOTE: in the Technical Issues 
section here[8].

If it's B then the question is "Is getFrame() and it's related plumbing 
really required within the Image Capture API"?  Also since I don't 
believe Blob's are Transferable[9] then choosing B has performance 
implications for processes that want to shift work off into a Web Worker 
too (see comment above about B's secondary pipeline).


I'm currently working on a js lib to make setup and manipulation of 
these types of pipelines as simple and standardised/optimised as 
possible.  But I'd really like to make sure I'm implementing the best 
option.

And I'd also like to be sure that we've really discussed the longer term 
impacts of choosing one method over another in terms of performance, 
synchronsiation, etc.

Thoughts?

roBman


[1] http://www.w3.org/TR/image-capture/
[2] http://www.w3.org/TR/recording/ (soon)
[3] 
https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/MediaRecorder.html
[4] 
https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html#find-the-ball-assignment-media-processing-and-recording
[5] 
http://lists.w3.org/Archives/Public/public-media-capture/2013May/0144.html
[6] 
http://lists.w3.org/Archives/Public/public-media-capture/2012Nov/0102.html
[7] https://groups.google.com/d/msg/discuss-webrtc/VhuPHRCrFAM/GfrocO6tDtsJ
[8] 
http://lists.w3.org/Archives/Public/public-media-capture/2013Jul/0101.html
[9] https://www.w3.org/Bugs/Public/show_bug.cgi?id=18611
Received on Thursday, 5 September 2013 02:51:09 UTC