RE: Media Capture Depth Stream Extension - call for review

> > Cullen wrote:
> > For cameras that do do this based on triangulation (either laser or stereo), you
> > end up with really bad fidelity in some cases when you use integers because the
> > values tend to linearly related with inverse of distance. At the browser API level, I
> > think this would work best if the values were float in point numbers - preferable
> > representing range in meters. If integers are used, it gets complicated on how to
> > map them and depends on actually distance. So I strongly support the proposal in
> > the issues section to use FLOAT in meters.

> Anssi wrote:
> Ningxin - perhaps you can clarify whether there are known issues in getting the
> actual distance from depth cameras in the market today, or if there are any other
> issues related to this the group should be aware of?
> 

Thanks to Cullen to point out this. AFAIK, the depth cameras in market are based on triangulation to measure the distance. The distance is linearly related with inverse of raw sensor data. For depth value representation and unit, from native camera SDK API perspective, both short in millimeters and float in meters are used. Kinect SDK is well-known to provide the depth value in millimeters. As you mentioned, the resolution of depth value decreases as distance increases, e.g. 1 mm at 50 cm, 5 cm at 5 m. the DepthSense SDK provides both short in mm and float in m depth value, the conversion is handled internally. From algorithm perspective, as I know, most existing Kinect algorithms work on short in mm, but the point cloud processing works on float in m. 

> Re floats, for interacting with the ImageData and CanvasRenderingContext2D,
> the most straight-forward way would perhaps be to use one of the
> ArrayBufferView types:
> 
>   https://www.khronos.org/registry/typedarray/specs/latest/#7
> 
> Cullen - I guess you'd prefer Float64Array that corresponds to unrestricted
> double in WebIDL types?
> 

I suppose Float32Array is used here.

Thanks,
-ningxin

> How to wire this into the WebGLRenderingContext is work in progress. There are
> basically two alternatives as noted in the last bullet of the Issues section:
> 
>   https://www.w3.org/wiki/Media_Capture_Depth_Stream_Extension#Issues
> 
> > Imagine a stereo camera that gives me a left and right camera plus a range
> image. So a call to getUseMedia might return two  video tracks plus a range
> track. I think we need to say something about the range image is registered to
> the left or right camera. Probably just saying that the first range track is
> registered to whatever the first video track is would do. Am I making any sense
> here or do I need to explain this better ?
> 
> I think I get what you mean. This requirement indeed needs to be addressed, if
> not in v1, in the later iterations of the spec.
> 
> One general solution that scales to N number of streams of any type would
> probably be (in abstract) something like what is being proposed as the
> MediaDeviceInfo.groupId. Or alternatively, to provide a method that takes a
> MediaStream as an argument, and returns other associated MediaStreams, if
> any.
> 
> Thanks,
> 
> -Anssi
> 
> >> [1]
> >> http://lists.w3.org/Archives/Public/public-media-capture/2014Jan/0039
> >> .html [2]
> >>
> https://www.w3.org/wiki/Media_Capture_Depth_Stream_Extension#Use_Case
> >> s [3]
> >>
> https://www.w3.org/wiki/Media_Capture_Depth_Stream_Extension#Spec_Ext
> >> ensions [4]
> >>
> https://www.w3.org/wiki/Media_Capture_Depth_Stream_Extension#Examples
> >> [5] http://www.w3.org/html/wg/wiki/ExtensionSpecifications

Received on Monday, 17 March 2014 23:10:36 UTC