W3C home > Mailing lists > Public > public-media-capture@w3.org > February 2013

Re: Why does media capture require a browser?

From: Johannes Odland <johannes.odland@gmail.com>
Date: Fri, 15 Feb 2013 21:20:14 +0100
Message-ID: <4538859868098384806@unknownmsgid>
To: Jim Barnett <Jim.Barnett@genesyslab.com>
Cc: Travis Leithead <travis.leithead@microsoft.com>, Martin Thomson <martin.thomson@gmail.com>, Jonathan Chetwynd <jay@peepo.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
To perform Computer Vision (CV) prosessing on the media you would normally
require an array of pixel data such as the Uint8ClampedArray on the
ImageData you get from CanvasRendering2dContext.

Can you get a frame from the Media Recording API without recording video?

Can you get an ImageData object and not a compressed JPEG blob?

Can you request data when you are ready processing the last frame?

I like the API. It's simple and easy to use, but I would use it for
recording video and snapping photos, not for feature detection. This can
still be done with HTMLVideoElement and HTMLCanvasElement, but with the
hassle of using the DOM :)

Johannes Odland

Den 15. feb. 2013 kl. 21:03 skrev Jim Barnett <Jim.Barnett@genesyslab.com>:

  But if your purpose is to do processing on the data, couldn’t you take
Blobs of data (i.e. by calling recording, rather than takePhoto) and
process them?   This API is intended to support media processing, so if
we’re not making the right video data available, I’d like to know.

-          Jim

*From:* Johannes Odland

*Sent:* Friday, February 15, 2013 2:39 PM
*To:* Jim Barnett
*Cc:* Travis Leithead; Martin Thomson; Jonathan Chetwynd;
*Subject:* Re: Why does media capture require a browser?

The Media Recording API would allow you to snap a photo from the stream,
but the resulting photo would be a JPEG/PNG blob and not ImageData.

If Ian Hickson's proposal for DOM-free CanvasRenderingContext2d became a
standard you could use the ImageBitmap object to render that blob into the
2d context. Seems a bit complicated though:

var lms = navigator.getUserMedia({video:true});

recorder.onPhoto = function(blob) {

    var context = new CanvasRenderingContext2d();

    context.drawImage(new ImageBitmap(blob),...);






Pardon my bad JS, the power cord to my laptop broke today :-/

Johannes Odland

Den 15. feb. 2013 kl. 20:11 skrev Jim Barnett <Jim.Barnett@genesyslab.com>:

 Does the Media Recording API help? It gives you access to the encoded data
without any intermediate HTML.

-          Jim

*From:* Johannes Odland

*Sent:* Friday, February 15, 2013 2:07 PM
*To:* Travis Leithead
*Cc:* Martin Thomson; Jonathan Chetwynd; public-media-capture@w3.org
*Subject:* Re: Why does media capture require a browser?

The navigator object itself is not so problematic. The navigator interface
is modular, being composed of several interfaces such as NavigatorID,
NavigatorLanguage and so forth.

Platforms implementing getUserMedia would not have to implement the whole
Navigator interface, nor call the object navigator.

The big problem, as I see it, is relying on the HTMLVideoElement and
HTMLCanvasElement for capturing a frame from a MediaStream.

Often you will not display the video nor the captured frame to the user,
and many times you would be interested in processing the captured frames
off the main thread.

I'll try to illustrate this with an example:

"Peter pulls up his todo and task-management app on the large hallway
screen. He is notified that he has previously approved access to his
webcam. The app immediately recognizes Peter and shows him an overview of
his most pressing tasks. Peter uses his hand to gesture the app to flip to
the next task board that shows him a list of items to shop at the
supermarket. He flips back to the first page, checks the task 'bring out
the garbage' and leaves the apartment. "

This is not so far fetched. People are already using gUM and feature
processing to implement gesture controls and face recognition in the
browser. The webcam swiper is only one example:

These apps have all in common that they do not show the video nor the
captured frames to the user. Preferably they would capture from the LMS
directly to a 2d context for processing in a web worker or using the
parallel JavaScript API destined for ECMAScript 8.

I feel this scenario is missing from the MediaStream Capture APIs.

Johannes Odland

Den 15. feb. 2013 kl. 19:18 skrev Travis Leithead <

 Also, the choice of the "navigator" object should not carry too much
implied notion that this is a browser-only feature. Navigator already
existed (and is the home geolocation too), so we are using it--there's no
other particular dependency on this object. For example, nodejs could
choose to host this functionality from global or from another object (say

 -----Original Message-----

From: Martin Thomson [mailto:martin.thomson@gmail.com<martin.thomson@gmail.com>

 Sent: Friday, February 15, 2013 10:04 AM

 To: Johannes Odland

 Cc: Jonathan Chetwynd; public-media-capture@w3.org

 Subject: Re: Why does media capture require a browser?

 node.js has a very different security model, so it is possible that a

 completely different API would be appropriate in that context.

 That's not to say that node.js couldn't copy aspects of the API, but

 they wouldn't want to be constrained by the necessarily byzantine

 selection API we have adopted, at a bare minimum.

 On 15 February 2013 07:53, Johannes Odland <johannes.odland@gmail.com>


 I've been asking the same question.

  Why can't I use the same API to set up a webcam using Raspberry Pi and



  Having an API that does not depend on the DOM/Browser makes it possible


 implement that API on multiple platforms such as in the browser and in


  Frameworks written for such an API could be used on all platforms.

  As it is right now we have different APIs for capturing and processing


  ( https://github.com/wearefractal/camera library for media capture on


  Johannes Odland

  Den 15. feb. 2013 kl. 16:18 skrev Jonathan Chetwynd <jay@peepo.com>:

  Why does media capture require a browser?

  rather than solely a javascript engine**.

  eg why navigator.getUserMedia?*


  Jonathan Chetwynd

  **embedded devices may only be capable of running a JS engine, with say


  but not a browser as well.

  There are of course a very large range of data capture devices beyond


  *for example nodeJS using V8 has no navigator object.


  Jonathan Chetwynd


  Eyetracking in HTML5
Received on Friday, 15 February 2013 20:20:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:26:15 UTC