Re: Advice on extending CanvasRenderingContext2D to support Depth Streams

On Tue, Nov 4, 2014 at 3:04 PM, Rob Manson <roBman@buildar.com> wrote:

> Hi Rick,
>
> First, I'd say it's definitely worth checkout out the presentation[1] and
> video of the demos[2] that were presented at TPAC.
>
>
>  It's also not clear to me how these API's will be used.
>> Can you write some very high pseudo-code that shows how a typical author
>> would get/set the data and present it to the user?
>>
>
> Here's some pseudo code examples[1] in our initial Use Cases document.
>
> Here's the one for a 2d canvas context (although it should probably use
> rAF() instead of setInterval()).
> NOTE: This obviously assumes the 2d drawing context is already setup.
>
>   setInterval(function() {
>     context.drawImage(video, 0, 0, w, h);
>
>     var depthData = context.getDepthData(0, 0, w, h);
>     // depthData.data is an Uint16Array for depth stream.
>     console.log(depthData.data instanceof Uint16Array); // prints "true"
>     var depthMap = imageData.data;
>
>     // the depthMap is an array of 16-bit unsigned integers
>     for (var i = 0; i < depthMap.length; i++) {
>       var depth = depthMap[i];
>       // process the 16-bit unsigned depth value
>     }
>   }, 1000 / fps);
>
> So you can see this is designed based on the existing
> canvas2dcontext.drawImage(video, ...)/getImageData(...) model that is
> already widely used (see discussion below).


It's odd that drawImage does more than copy pixels.
Will this happen just with the particular video element, or also with other
contexts that have depthData?


>
>      Can you explain why you would like to see this integrated with canvas?
>>
>
> Well canvas objects deliver the only pipeline available to Computer Vision
> related media stream processing on the Web Platform today.
>
> <video>  ->  <canvas(2d/3d)>  ->  Typed Array  ->  js  ->  *
>              (using .drawImage)   (using .getImageData)
>
> So this pattern seemed like an obvious starting point for us.

BTW: On a related note, I'm just about to circulate the results of a test
> framework we've been running across all the main browser/OS combinations
> that looks at the overall performance of these types of media stream
> processing pipelines - especially contrasting the 2dcanvas and the webgl
> canvas pipelines. I'll post info to this list soon and would appreciate any
> feedback.
>
>
>      It seems that the extension to Canvas 2D could be implemented as a
>>     new standalone object that works in conjunction with a media element
>>     (that is possibly drawn to a canvas)
>>
>
> I'm not quite clear exactly how you see this might work. Could you please
> sketch this idea out a little bit more?
>

It seems that you're only interested in canvas because it defines a flat
set of pixels that you can write into and read from.
At the same time, 2D canvas has a lot of methods that you are not
interested in. It will also be difficult to describe how all the methods
and properties will interact with the depth data.

Why don't you create a new canvas context type that just defines the
methods that you need and add it to the HTML spec [1]?

This will be much simpler to specify and you won't have to battle browser
vendors that will almost certainly object.
When implementing this new context, you can likely store a hidden 2D
context that you draw the actual pixels into so you get HW acceleration for
free.


> BTW: Initially, I used to think we needed a way to decode the video frames
> without touching other DOM elements like canvas at all. But after running
> our test framework it seems that some of the interactions between <video>
> and <canvas> have already been really well optimised.
>
> Of course, I and the other authors of this depth extension are very open
> to suggestions for better solutions though.
>
>
>      The model states that this should taint the canvas [1].
>>     Is it possible today to draw a local video stream from your camera
>>     to a 2d context and read the pixels?
>>
>
> Yes. The model we have applied so far is that we pack the Uint16Array data
> into as few channels as possible (in WebGL we're using 3 in a 5_6_5
> structure) so they can be extracted and recombined in as few steps as
> possible.
>
> There is also some existing work on how to pack these uint16's into
> existing codecs - see reference 8[2] on our Use Cases page.
>
> Our thoughts are that the getDepthData() method would take the data from
> the RGBA channels from the Uint8Array in the depth video and recombine them
> in an agreed way to pack them into a Uint16Array and return that.
>
> I think the behaviour would be exactly parallel to how writing <video>
> frames to a 2d canvas drawing context works today.
>

Yes, it seems that reading pixels from a canvas with camera data works:
http://www.andismith.com/getUserMedia-examples/photobooth/index.html

1: http://www.w3.org/TR/html/scripting-1.html#canvas-context-2d

Received on Thursday, 6 November 2014 05:15:44 UTC