Depth image capture workflow proposal for Web

Depth image/map (http://en.wikipedia.org/wiki/Depth_map) comes from depth camera/sensor. The depth image is usually used with color image for 3D vision usage.

I am working on media capture depth stream extensions (http://w3c.github.io/mediacapture-depth/) which allows web to request depth stream from depth camera. I'd like to get feedbacks about the depth image capture workflow on web. Some previous discussions can be found at https://github.com/w3c/mediacapture-depth/issues/76. It tries to extend Mediastream Image Capture (http://www.w3.org/TR/image-capture/) to support depth image capture.

In native camera SDK, such as Google Project Tango SDK, MS KinectSDK and Intel RealSense SDK, the common capture workflow is like:

1. configure capture mode (color, depth or color+depth), resolution, frame rate etc.,.
2. start capture pipeline.
3. request a capture sample by an API call (synched) or via a callback (asynched).
4. data access: the returned capture sample carries either color image, depth image or color+depth images per configuration.
5. map depth/color image: as depth and color image come from different cameras, native SDK usually provides an API to map the coordinates between the two cameras. Some native SDKs provide pre-mapped color and depth image.

For example, the capture sample is defined like this in native SDK:

struct sample {
  RgbImage* rgb;
  DepthMap* depth;
};

Back to web API, this work flow is proposed to be:

1. configuration: by getUserMedia constraints. Besides {'video': true} for color capture. We added {'depth': true} to specify depth capture. {'video': true, 'depth': true} for aligned color+depth use case.
2. capture pipeline: by new ImageCapture(stream) in image capture pipeline. I propose to extend constructor of ImageCapture to take a MediaStream object instead of MediaStreamTrack object for aligned color+depth use case.
3. request a capture sample: by grabFrame().
4. data access: by extending FrameGrabEvent with depthMap. For {'video': true}, only imageData is populated. For {'depth': true}, only depthMap is populated. For {'video': true, 'depth': true}, imageData and depthmap are populated by aligned capture.
interface FrameGrabEvent : Event {
    readonly attribute ImageData imageData;
    readonly attribute DepthMap depthMap;
};

The proposal of DepthMap interface proposal is:
interface DepthMap {
    readonly attribute unsigned long width;
    readonly attribute unsigned long height;
    readonly attribute string type;
    readonly attribute string format;
    readonly attribute string units;
    readonly attribute float near;
    readonly attribute float far;
    readonly attribute Uint16Array data;
    readonly attribute Uint16Array? confidence;
};

5. map depth/color images: it is proposed to do the depth/color mapping in native, so web developer can expect aligned and mapped color and depth image when requesting {'video': true, 'depth': true}

Your thoughts on this?

Thanks,
-ningxin

Received on Wednesday, 18 March 2015 03:15:29 UTC