Re: [depth] depth value encoding proposal from Benjamin Schwartz on 2014-08-20 (public-media-capture@w3.org from August 2014)

From: Benjamin Schwartz <bemasc@google.com>
Date: Wed, 20 Aug 2014 11:23:53 -0700
To: "Hu, Ningxin" <ningxin.hu@intel.com>
Cc: Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <CAHbrMsCWn-c8GWK5cz5WDUrGi3susQM+fSreqeJcKfhNsmJoYw@mail.gmail.com>
I think exposing those metadata is definitely necessary, but displaying
them as properties of the capturer seems inconvenient.  The developer needs
those metadata whenever they are using depth data, so if the API does not
provide both data and metadata in a unified structure, then every developer
will have to create their own javascript structure to keep them together.

Similar considerations would apply if, for example, a camera could produce
frames in various colorspaces or pixel formats.  The metadata should stay
with the data.

Currently, the entire Canvas API contains a hardcoded assumption that
ImageData is 8-bit RGBA.  Overloading ImageData to contain other pixel
types, even though it has no attribute to indicate its pixel type, seems
like a recipe for confusion.


On Wed, Aug 20, 2014 at 11:09 AM, Hu, Ningxin <ningxin.hu@intel.com> wrote:

>  Hi Benjamin,
>
>
>
> On native platform, it is usually to represent depth image as Gray16 or
> Z16 format which are also seen as kind of image. For the meta data, say
> depth units, FOV and focal length etc.,, in another thread, Rob proposed
> them to be capabilities [1].
>
> What do you think?
>
>
>
> Thanks,
>
> -ningxin
>
>
>
> [1]
> http://lists.w3.org/Archives/Public/public-media-capture/2014Jun/0152.html
>
>
>
> *From:* Benjamin Schwartz [mailto:bemasc@google.com]
> *Sent:* Tuesday, August 19, 2014 1:18 PM
> *To:* Harald Alvestrand
> *Cc:* public-media-capture@w3.org
>
> *Subject:* Re: [depth] depth value encoding proposal
>
>
>
> I misunderstood some of the preceding discussion, not realizing that all
> mention of depth-to-color mappings had already been removed.  That's good.
>  However, the proposed solution (ImageData with special types) has several
> remaining problems, as noted in the draft's "Issue 1".
>
>
>
> I think depth and color are sufficiently different that it might be better
> to create a new DepthData type than to reuse the ImageData type.  A
> DepthData object could also indicate whether the units are known (e.g.
> millimeters) or unknown, and specify the physical field of view, along with
> other valuable metadata.
>
>
>
> On Tue, Aug 19, 2014 at 3:31 PM, Harald Alvestrand <harald@alvestrand.no>
> wrote:
>
> On 08/19/2014 06:01 PM, Benjamin Schwartz wrote:
>
>   On Tue, Aug 19, 2014 at 10:58 AM, Kostiainen, Anssi <
> anssi.kostiainen@intel.com> wrote:
>
> Hi Benjamin,
>
> Thanks for your comments, and sorry for the late reply due to the vacation
> period.
>
> I noticed the following comments (1) and (2) you’ve made, and would like
> to check the status, and ask your help to fill in the gaps if any:
>
> (1) "I do not think this kind of arbitrary mapping is appropriate in a W3C
> standard.  We should arrange for a natural representation instead, with
> modifications to Canvas and WebGL if necessary.” [1]
>
> To make sure I understood this correctly:
>
> Firstly, you're proposing we patch the CanvasRenderingContext2D and make
> ImageData.data of type ArrayBufferView instead of Uint8ClampedArray to
> allow the Uint16Array type, correct? Better suggestions?
>
>
>
> I would suggest going all the way to Float32 as well.
>
>
>
> Secondly, extend the WebGLRenderingContext along the lines of the
> LUMINANCE16 extension proposal [2] -- or would you prefer to use the depth
> component of a GL texture as you previously suggested? Known issues? I’d
> like to hear your latest thoughts on this and if possible a concrete
> proposal how you’d prefer this to be spec’d to be practical for developers
> and logical for implementers.
>
>
>
> I'd strongly prefer to use a depth component (and also have an option to
> use 32-bit float).  This would raise the GLES version requirement for
> WebGL, but I think this is not an unreasonable requirement for a feature
> that also requires a depth camera!
>
>
>
> (2) "I don't think getDepthTracks() should return color-video.  If you
> want to prototype using depth-to-color mappings, the logical way is to
> treat the depth channel as a distinct color video camera, accessible by the
> usual device selection mechanisms.” [3]
>
> This is what the spec currently says re getDepthTracks():
>
> [[
>
> The getDepthTracks() method, when invoked, must return a sequence of
> MediaStreamTrack objects representing the depth tracks in this stream.
>
> The getDepthTracks() method must return a sequence that represents a
> snapshot of all the MediaStreamTrack objects in this stream's track set
> whose kind is equal to "depth". The conversion from the track set to the
> sequence is user agent defined and the order does not have to be stable
> between calls.
>
> ]]
>
> Do you have a concrete proposal how you’d tighten the prose to clear the
> confusion?
>
>
>
> I would not include |getDepthTracks| until we have a depth datatype.
>  Instead, I would add each depth camera as an input device of kind: 'video'
> in the list returned by enumerateDevices(), with its label containing a
> human-readable indication that its color video data is computed from depth
> via a mapping defined by the user agent.
>
>
>
> But once this spec is implemented, we have a depth datatype.... or did I
> misunderstand the proposal?
>
> I'm afraid that if we spec out a path where depth is treated as a weird
> kind of video, we'll never escape from it - so I'm happy to see such a
> fully-worked proposal that includes both the depth datatype and the depth
> track definition.
>
>
>
Received on Wednesday, 20 August 2014 18:24:23 UTC