Feedback on the MediaStream Capture Post-Processing scenarios

Hi,

here's some feedback on, and questions about the MediaStream Capture 
Scenarios[1] from an Augmented Web[2] perspective.  So I guess this is 
for Travis but as always all answers and comments are welcome 8)


3.3 Find the ball assignment (media processing and recording)
-------------------------------------------------------------
   "Alice is finishing up a college on-line course on image
    processing..."

I think it's definitely important to include image processing scenarios 
in this document, however I don't think this scenario captures how 
critical image processing will be for the Augmented Web. A more 
pragmatic example that people might more closely relate to would be "QR 
code scanning".  So instead of "detecting a blue ball", it could be 
"detecting a QR code".  There are existing libraries that can be used 
for this[4].


3.n
---
I would like to propose the addition of a number of other stream 
processing based scenarios to flesh out this area further.
Here's a list;
- QR/barcode scanning
- pitch detection
- voice commands
- head/gesture tracking
- facial recognition
- fiducial marker tracking
- natural feature tracking


8.5 Pre-processing vs 8.6 Post-processing
-----------------------------------------
The pre/post distinction seems to be based on two types as described 
here[5].

   a. realtime
      pre is before the stream is connected to a sink (e.g. <video>
      element) and post is after.

   b. recorded
      pre is before the stream is captured "to a known MIME format" and
      post is after.

However, I'm not sure this distinction has strictly been applied to the 
content in those sections.  Or am I misunderstanding this distinction?

e.g. 8.5.1 example 3 is "Face-recognition and gesture detection". 
Surely face and gesture detection and face recognition could only be 
done in post for realtime and both pre and post for recorded.  Based on 
the 6 item list in "8.6.1 Web platform post-processing toolbox" it's 
hard to see how "face-recognition" could be done without connecting the 
video stream to a sink <video> element.  So for realtime (e.g. not 
recorded) then this would really be post-processing wouldn't it? (e.g. 
realtime after connected to a sink).

Perhaps the goals of using this distinction here could be met in a 
simpler way?


Media Capture vs Recording
--------------------------
In 2. Concepts and Definitions "Media Capture" is defined as "obtaining 
a stream of data from a device" and "Recording" is defined as "capture 
of media under application control and in a specific, known, format".

It's a little confusing that the second part of this ("Recording") uses 
the word "capture" which is also in the name of the first part ("Media 
Capture").

Plus I'm not sure this distinction is completely clear either.

a. With the current image stream processing pipeline you connect a 
stream to a <video> element then connect that to a <canvas> and then 
extract the ImageData from there using an event loop like 
requestAnimationFrame() or setTimeout().

b. With the Mediastream Image Capture API you extract a track from a 
stream and then use that to create an Image Capture object that you call 
getFrame() on to extract the ImageData using an event loop like 
requestAnimationFrame() or setTimeout().

c. With the MediaStream Recording API you connect a stream through a 
MediaRecorder object and call start() to extract a Blob of data at 
regular timeslices.

But, for all 3 of these pipelines including the "Recording" example the 
frame data can be accessed before the "capture" is completed.  So even 
"Recording" can also behave like "realtime" from a data processing 
perspective.


8.6.2 Time sensitivity and performance
--------------------------------------

   "Some post-processing scenarios are time-sensitive—especially those
    scenarios that involve processing large amounts of data while the
    user waits."

I think real-time applications are the most time sensitive.  For example 
face recognition or gesture tracking need to be fast and responsive with 
little or no lag otherwise at best it can feel like the user interface 
is swimming.


Numbering?
----------
I think that items 4, 5 and 6 should really be moved in one level so 
they are 3.4, 3.5, 3.6 and all their children should move in as well.



I hope this feedback is clear and useful.  I know it's a little long so 
if you'd like me to break any of this out into separate email messages 
just let me know.

roBman



[1] 
https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html
[2] http://www.w3.org/community/ar/
[3] 
https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html#find-the-ball-assignment-media-processing-and-recording
[4] https://github.com/LazarSoft/jsqrcode
[5] 
https://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html#post-processing

Received on Thursday, 5 September 2013 15:13:40 UTC