Re: Why does media capture require a browser? from Johannes Odland on 2013-02-18 (public-media-capture@w3.org from February 2013)

From: Johannes Odland <johannes.odland@gmail.com>
Date: Mon, 18 Feb 2013 20:32:23 +0100
To: Jim Barnett <Jim.Barnett@genesyslab.com>
Cc: Travis Leithead <travis.leithead@microsoft.com>, Martin Thomson <martin.thomson@gmail.com>, Jonathan Chetwynd <jay@peepo.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <-7138469188344350174@unknownmsgid>
Right now the performance is not good enough to capture all frames.

It will all depend on the operations you want to perform and the device you
run on.

I think it would be better to pull for frames with "takeFrame" when ready.

The RiverTrail parallel javascript strawman proposal has a plugin for
FireFox and examples of video processing here:
https://github.com/RiverTrail/RiverTrail

Granted, this is only visual effects such as blur and sephia, but they are
good examples of how compute expensive post processing will be.

Simple operations can run with few dropped frames with parallel JavaScript
on my dual core MacBook Pro, but any expensive operations like Gaussian
blur creates a lot of dropped frames.

Johannes Odland

Den 18. feb. 2013 kl. 20:22 skrev Jim Barnett <Jim.Barnett@genesyslab.com>:

  In a case like this, can you iterate steps 3-5 quickly enough to capture
enough of the video stream?  In principle, we could tweak the recording API
to return frames rather than encoded Blobs, but I gather that would use up
so much space that the system might choke to death in a hurry.  Relying on
takePhoto lets the JS code decide how much memory it is using at any one
time, but I want to make sure that the calls can be done quickly enough.



-          Jim



*From:* Johannes Odland
[mailto:johannes.odland@gmail.com<johannes.odland@gmail.com>]

*Sent:* Monday, February 18, 2013 2:17 PM
*To:* Travis Leithead
*Cc:* Jim Barnett; Martin Thomson; Jonathan Chetwynd;
public-media-capture@w3.org
*Subject:* Re: Why does media capture require a browser?



Exactly :)



If the raw frame from "takeFrame" was compatible with ImageData then we
would be even better off.



Other frameworks could take ImageData as input and be indifferent to wether
a canvas or an image capture object was the origin.



As an example:



The "River Trail" parallel JavaScript proposal currently supports creating
a parallel array from a canvas. If we can convince them to support an
ImageData object as input then analyzing a video stream should be fairly
easy:



1. Open stream using gUM.

2. Create ImageCapture object

3. "takeFrame"

4. Create ParallelArray for processing (Hopefully from ECMAScript 8)

5. Repeat from 3.



Frames could also be passed to a web worker for processing.



Johannes Odland


Den 18. feb. 2013 kl. 19:56 skrev Travis Leithead <
travis.leithead@microsoft.com>:

 You want a "raw" frame (akin to HTML Canvas' ImageData). I definitely see
the use cases for both "takePhoto" (give me a PNG/JPEG-encoded image file –
small file size, suitable for upload to my favorite photo sharing site),
and "takeFrame" (give me a raw array of RGBA byte values, probably a
Uint8ClampedArray).





*From:* Johannes Odland
[mailto:johannes.odland@gmail.com<johannes.odland@gmail.com>]

*Sent:* Saturday, February 16, 2013 12:54 AM
*To:* Jim Barnett
*Cc:* Travis Leithead; Martin Thomson; Jonathan Chetwynd;
public-media-capture@w3.org
*Subject:* Re: Why does media capture require a browser?



The Image Capture API proposed here is a better fit:
http://gmandyam.github.com/image-capture/

It allows you to capture frames from the  MediaStream without recording
video.



However, the takePhoto() method might pause the streaming to take a full
quality photo, which is useful if you're taking a photo, but not so useful
for our use case ( i.e. post processing frames using CV to check for
barcodes).

To better support Computer Vision (CV) operations such as feature
detection, face recognition and gestures we need a way to grab frames from
the stream.

We need a grabFrame() method that
- Captures still frames from the MediaStream

- Does not pause the streaming
- Does not require DOM operations

- Returns an array of pixel values (for example an ImageData object) that
can be processed directly with CV operations





The use cases are many:



-Facilitate interaction with the application using gestures

-Automatically snap a photo when everybody is smiling

-Facilitate Augmented Reality

-Read barcodes and (I'm

Sorry to bring this up) QR codes



Johannes Odland


Den 15. feb. 2013 kl. 21:03 skrev Jim Barnett <Jim.Barnett@genesyslab.com>:

 But if your purpose is to do processing on the data, couldn’t you take
Blobs of data (i.e. by calling recording, rather than takePhoto) and
process them?   This API is intended to support media processing, so if
we’re not making the right video data available, I’d like to know.



-          Jim



*From:* Johannes Odland
[mailto:johannes.odland@gmail.com<johannes.odland@gmail.com>]

*Sent:* Friday, February 15, 2013 2:39 PM
*To:* Jim Barnett
*Cc:* Travis Leithead; Martin Thomson; Jonathan Chetwynd;
public-media-capture@w3.org
*Subject:* Re: Why does media capture require a browser?



The Media Recording API would allow you to snap a photo from the stream,
but the resulting photo would be a JPEG/PNG blob and not ImageData.



If Ian Hickson's proposal for DOM-free CanvasRenderingContext2d became a
standard you could use the ImageBitmap object to render that blob into the
2d context. Seems a bit complicated though:





var lms = navigator.getUserMedia({video:true});

recorder.onPhoto = function(blob) {

    var context = new CanvasRenderingContext2d();

    context.drawImage(new ImageBitmap(blob),...);

    ....

};

recorder.record();

recorder.takePhoto(...);

recorder.stopRecording();



Pardon my bad JS, the power cord to my laptop broke today :-/

Johannes Odland


Den 15. feb. 2013 kl. 20:11 skrev Jim Barnett <Jim.Barnett@genesyslab.com>:

 Does the Media Recording API help? It gives you access to the encoded data
without any intermediate HTML.
http://lists.w3.org/Archives/Public/public-media-capture/2012Dec/att-0159/RecordingProposal.html




-          Jim



*From:* Johannes Odland
[mailto:johannes.odland@gmail.com<johannes.odland@gmail.com>]

*Sent:* Friday, February 15, 2013 2:07 PM
*To:* Travis Leithead
*Cc:* Martin Thomson; Jonathan Chetwynd; public-media-capture@w3.org
*Subject:* Re: Why does media capture require a browser?



The navigator object itself is not so problematic. The navigator interface
is modular, being composed of several interfaces such as NavigatorID,
NavigatorLanguage and so forth.



Platforms implementing getUserMedia would not have to implement the whole
Navigator interface, nor call the object navigator.



The big problem, as I see it, is relying on the HTMLVideoElement and
HTMLCanvasElement for capturing a frame from a MediaStream.



Often you will not display the video nor the captured frame to the user,
and many times you would be interested in processing the captured frames
off the main thread.



I'll try to illustrate this with an example:



"Peter pulls up his todo and task-management app on the large hallway
screen. He is notified that he has previously approved access to his
webcam. The app immediately recognizes Peter and shows him an overview of
his most pressing tasks. Peter uses his hand to gesture the app to flip to
the next task board that shows him a list of items to shop at the
supermarket. He flips back to the first page, checks the task 'bring out
the garbage' and leaves the apartment. "





This is not so far fetched. People are already using gUM and feature
processing to implement gesture controls and face recognition in the
browser. The webcam swiper is only one example:
http://iambrandonn.github.com/WebcamSwiper/


These apps have all in common that they do not show the video nor the
captured frames to the user. Preferably they would capture from the LMS
directly to a 2d context for processing in a web worker or using the
parallel JavaScript API destined for ECMAScript 8.



I feel this scenario is missing from the MediaStream Capture APIs.



Johannes Odland


Den 15. feb. 2013 kl. 19:18 skrev Travis Leithead <
travis.leithead@microsoft.com>:

 Also, the choice of the "navigator" object should not carry too much
implied notion that this is a browser-only feature. Navigator already
existed (and is the home geolocation too), so we are using it--there's no
other particular dependency on this object. For example, nodejs could
choose to host this functionality from global or from another object (say
"media").

-----Original Message-----

From: Martin Thomson [mailto:martin.thomson@gmail.com<martin.thomson@gmail.com>
]

 Sent: Friday, February 15, 2013 10:04 AM

 To: Johannes Odland

 Cc: Jonathan Chetwynd; public-media-capture@w3.org

 Subject: Re: Why does media capture require a browser?



 node.js has a very different security model, so it is possible that a

 completely different API would be appropriate in that context.



 That's not to say that node.js couldn't copy aspects of the API, but

 they wouldn't want to be constrained by the necessarily byzantine

 selection API we have adopted, at a bare minimum.



 On 15 February 2013 07:53, Johannes Odland <johannes.odland@gmail.com>

 wrote:

 I've been asking the same question.



  Why can't I use the same API to set up a webcam using Raspberry Pi and

 and

 Node.js?



  Having an API that does not depend on the DOM/Browser makes it possible

 to

 implement that API on multiple platforms such as in the browser and in

  Node.js.



  Frameworks written for such an API could be used on all platforms.



  As it is right now we have different APIs for capturing and processing

  media.



  ( https://github.com/wearefractal/camera library for media capture on

  node.js)



  Johannes Odland



  Den 15. feb. 2013 kl. 16:18 skrev Jonathan Chetwynd <jay@peepo.com>:



  Why does media capture require a browser?

  rather than solely a javascript engine**.



  eg why navigator.getUserMedia?*



  regards



  Jonathan Chetwynd



  **embedded devices may only be capable of running a JS engine, with say

  camera,

  but not a browser as well.

  There are of course a very large range of data capture devices beyond

 A/V.



  *for example nodeJS using V8 has no navigator object.





  --

  Jonathan Chetwynd

  http://www.gnote.org

  Eyetracking in HTML5
Received on Monday, 18 February 2013 19:32:52 UTC