RE: Why does media capture require a browser? from Travis Leithead on 2013-02-18 (public-media-capture@w3.org from February 2013)

From: Travis Leithead <travis.leithead@microsoft.com>
Date: Mon, 18 Feb 2013 18:56:06 +0000
To: Johannes Odland <johannes.odland@gmail.com>, Jim Barnett <Jim.Barnett@genesyslab.com>
CC: Martin Thomson <martin.thomson@gmail.com>, Jonathan Chetwynd <jay@peepo.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <9768D477C67135458BF978A45BCF9B3853BFB5B4@TK5EX14MBXW602.wingroup.windeploy.ntde>
You want a "raw" frame (akin to HTML Canvas' ImageData). I definitely see the use cases for both "takePhoto" (give me a PNG/JPEG-encoded image file - small file size, suitable for upload to my favorite photo sharing site), and "takeFrame" (give me a raw array of RGBA byte values, probably a Uint8ClampedArray).


From: Johannes Odland [mailto:johannes.odland@gmail.com]
Sent: Saturday, February 16, 2013 12:54 AM
To: Jim Barnett
Cc: Travis Leithead; Martin Thomson; Jonathan Chetwynd; public-media-capture@w3.org
Subject: Re: Why does media capture require a browser?

The Image Capture API proposed here is a better fit: http://gmandyam.github.com/image-capture/

It allows you to capture frames from the  MediaStream without recording video.

However, the takePhoto() method might pause the streaming to take a full quality photo, which is useful if you're taking a photo, but not so useful for our use case ( i.e. post processing frames using CV to check for barcodes).

To better support Computer Vision (CV) operations such as feature detection, face recognition and gestures we need a way to grab frames from the stream.

We need a grabFrame() method that
- Captures still frames from the MediaStream
- Does not pause the streaming
- Does not require DOM operations
- Returns an array of pixel values (for example an ImageData object) that can be processed directly with CV operations


The use cases are many:

-Facilitate interaction with the application using gestures
-Automatically snap a photo when everybody is smiling
-Facilitate Augmented Reality
-Read barcodes and (I'm
Sorry to bring this up) QR codes


Johannes Odland

Den 15. feb. 2013 kl. 21:03 skrev Jim Barnett <Jim.Barnett@genesyslab.com<mailto:Jim.Barnett@genesyslab.com>>:
But if your purpose is to do processing on the data, couldn't you take Blobs of data (i.e. by calling recording, rather than takePhoto) and process them?   This API is intended to support media processing, so if we're not making the right video data available, I'd like to know.


-          Jim

From: Johannes Odland [mailto:johannes.odland@gmail.com]
Sent: Friday, February 15, 2013 2:39 PM
To: Jim Barnett
Cc: Travis Leithead; Martin Thomson; Jonathan Chetwynd; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: Re: Why does media capture require a browser?

The Media Recording API would allow you to snap a photo from the stream, but the resulting photo would be a JPEG/PNG blob and not ImageData.

If Ian Hickson's proposal for DOM-free CanvasRenderingContext2d became a standard you could use the ImageBitmap object to render that blob into the 2d context. Seems a bit complicated though:


var lms = navigator.getUserMedia({video:true});
recorder.onPhoto = function(blob) {
    var context = new CanvasRenderingContext2d();
    context.drawImage(new ImageBitmap(blob),...);
    ....
};
recorder.record();
recorder.takePhoto(...);
recorder.stopRecording();

Pardon my bad JS, the power cord to my laptop broke today :-/

Johannes Odland

Den 15. feb. 2013 kl. 20:11 skrev Jim Barnett <Jim.Barnett@genesyslab.com<mailto:Jim.Barnett@genesyslab.com>>:
Does the Media Recording API help? It gives you access to the encoded data without any intermediate HTML.  http://lists.w3.org/Archives/Public/public-media-capture/2012Dec/att-0159/RecordingProposal.html


-          Jim

From: Johannes Odland [mailto:johannes.odland@gmail.com]
Sent: Friday, February 15, 2013 2:07 PM
To: Travis Leithead
Cc: Martin Thomson; Jonathan Chetwynd; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: Re: Why does media capture require a browser?

The navigator object itself is not so problematic. The navigator interface is modular, being composed of several interfaces such as NavigatorID, NavigatorLanguage and so forth.

Platforms implementing getUserMedia would not have to implement the whole Navigator interface, nor call the object navigator.

The big problem, as I see it, is relying on the HTMLVideoElement and HTMLCanvasElement for capturing a frame from a MediaStream.

Often you will not display the video nor the captured frame to the user, and many times you would be interested in processing the captured frames off the main thread.

I'll try to illustrate this with an example:

"Peter pulls up his todo and task-management app on the large hallway screen. He is notified that he has previously approved access to his webcam. The app immediately recognizes Peter and shows him an overview of his most pressing tasks. Peter uses his hand to gesture the app to flip to the next task board that shows him a list of items to shop at the supermarket. He flips back to the first page, checks the task 'bring out the garbage' and leaves the apartment. "


This is not so far fetched. People are already using gUM and feature processing to implement gesture controls and face recognition in the browser. The webcam swiper is only one example: http://iambrandonn.github.com/WebcamSwiper/

These apps have all in common that they do not show the video nor the captured frames to the user. Preferably they would capture from the LMS directly to a 2d context for processing in a web worker or using the parallel JavaScript API destined for ECMAScript 8.

I feel this scenario is missing from the MediaStream Capture APIs.

Johannes Odland

Den 15. feb. 2013 kl. 19:18 skrev Travis Leithead <travis.leithead@microsoft.com<mailto:travis.leithead@microsoft.com>>:
Also, the choice of the "navigator" object should not carry too much implied notion that this is a browser-only feature. Navigator already existed (and is the home geolocation too), so we are using it--there's no other particular dependency on this object. For example, nodejs could choose to host this functionality from global or from another object (say "media").
-----Original Message-----
From: Martin Thomson [mailto:martin.thomson@gmail.com]
Sent: Friday, February 15, 2013 10:04 AM
To: Johannes Odland
Cc: Jonathan Chetwynd; public-media-capture@w3.org<mailto:public-media-capture@w3.org>
Subject: Re: Why does media capture require a browser?

node.js has a very different security model, so it is possible that a
completely different API would be appropriate in that context.

That's not to say that node.js couldn't copy aspects of the API, but
they wouldn't want to be constrained by the necessarily byzantine
selection API we have adopted, at a bare minimum.

On 15 February 2013 07:53, Johannes Odland <johannes.odland@gmail.com<mailto:johannes.odland@gmail.com>>
wrote:
I've been asking the same question.

Why can't I use the same API to set up a webcam using Raspberry Pi and
and
Node.js?

Having an API that does not depend on the DOM/Browser makes it possible
to
implement that API on multiple platforms such as in the browser and in
Node.js.

Frameworks written for such an API could be used on all platforms.

As it is right now we have different APIs for capturing and processing
media.

( https://github.com/wearefractal/camera library for media capture on
node.js)

Johannes Odland

Den 15. feb. 2013 kl. 16:18 skrev Jonathan Chetwynd <jay@peepo.com<mailto:jay@peepo.com>>:

Why does media capture require a browser?
rather than solely a javascript engine**.

eg why navigator.getUserMedia?*

regards

Jonathan Chetwynd

**embedded devices may only be capable of running a JS engine, with say
camera,
but not a browser as well.
There are of course a very large range of data capture devices beyond
A/V.

*for example nodeJS using V8 has no navigator object.


--
Jonathan Chetwynd
http://www.gnote.org
Eyetracking in HTML5
Received on Monday, 18 February 2013 18:57:09 UTC