- From: Rob Manson <roBman@mob-labs.com>
- Date: Fri, 06 Sep 2013 01:13:17 +1000
- To: "public-media-capture@w3.org" <public-media-capture@w3.org>
- CC: "public-ar@w3.org" <public-ar@w3.org>
Hi, here's some feedback on, and questions about the MediaStream Capture Scenarios[1] from an Augmented Web[2] perspective. So I guess this is for Travis but as always all answers and comments are welcome 8) 3.3 Find the ball assignment (media processing and recording) ------------------------------------------------------------- "Alice is finishing up a college on-line course on image processing..." I think it's definitely important to include image processing scenarios in this document, however I don't think this scenario captures how critical image processing will be for the Augmented Web. A more pragmatic example that people might more closely relate to would be "QR code scanning". So instead of "detecting a blue ball", it could be "detecting a QR code". There are existing libraries that can be used for this[4]. 3.n --- I would like to propose the addition of a number of other stream processing based scenarios to flesh out this area further. Here's a list; - QR/barcode scanning - pitch detection - voice commands - head/gesture tracking - facial recognition - fiducial marker tracking - natural feature tracking 8.5 Pre-processing vs 8.6 Post-processing ----------------------------------------- The pre/post distinction seems to be based on two types as described here[5]. a. realtime pre is before the stream is connected to a sink (e.g. <video> element) and post is after. b. recorded pre is before the stream is captured "to a known MIME format" and post is after. However, I'm not sure this distinction has strictly been applied to the content in those sections. Or am I misunderstanding this distinction? e.g. 8.5.1 example 3 is "Face-recognition and gesture detection". Surely face and gesture detection and face recognition could only be done in post for realtime and both pre and post for recorded. Based on the 6 item list in "8.6.1 Web platform post-processing toolbox" it's hard to see how "face-recognition" could be done without connecting the video stream to a sink <video> element. So for realtime (e.g. not recorded) then this would really be post-processing wouldn't it? (e.g. realtime after connected to a sink). Perhaps the goals of using this distinction here could be met in a simpler way? Media Capture vs Recording -------------------------- In 2. Concepts and Definitions "Media Capture" is defined as "obtaining a stream of data from a device" and "Recording" is defined as "capture of media under application control and in a specific, known, format". It's a little confusing that the second part of this ("Recording") uses the word "capture" which is also in the name of the first part ("Media Capture"). Plus I'm not sure this distinction is completely clear either. a. With the current image stream processing pipeline you connect a stream to a <video> element then connect that to a <canvas> and then extract the ImageData from there using an event loop like requestAnimationFrame() or setTimeout(). b. With the Mediastream Image Capture API you extract a track from a stream and then use that to create an Image Capture object that you call getFrame() on to extract the ImageData using an event loop like requestAnimationFrame() or setTimeout(). c. With the MediaStream Recording API you connect a stream through a MediaRecorder object and call start() to extract a Blob of data at regular timeslices. But, for all 3 of these pipelines including the "Recording" example the frame data can be accessed before the "capture" is completed. So even "Recording" can also behave like "realtime" from a data processing perspective. 8.6.2 Time sensitivity and performance -------------------------------------- "Some post-processing scenarios are time-sensitive—especially those scenarios that involve processing large amounts of data while the user waits." I think real-time applications are the most time sensitive. For example face recognition or gesture tracking need to be fast and responsive with little or no lag otherwise at best it can feel like the user interface is swimming. Numbering? ---------- I think that items 4, 5 and 6 should really be moved in one level so they are 3.4, 3.5, 3.6 and all their children should move in as well. I hope this feedback is clear and useful. I know it's a little long so if you'd like me to break any of this out into separate email messages just let me know. roBman [1] https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html [2] http://www.w3.org/community/ar/ [3] https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html#find-the-ball-assignment-media-processing-and-recording [4] https://github.com/LazarSoft/jsqrcode [5] https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html#post-processing
Received on Thursday, 5 September 2013 15:13:40 UTC