RE: revised recording proposal from Jim Barnett on 2012-11-30 (public-media-capture@w3.org from November 2012)

From: Jim Barnett <Jim.Barnett@genesyslab.com>
Date: Fri, 30 Nov 2012 18:37:23 +0000
To: "Mandyam, Giridhar" <mandyam@quicinc.com>, "Timothy B. Terriberry" <tterriberry@mozilla.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <57A15FAF9E58F841B2B1651FFE16D281BBAE@GENSJZMBX01.msg.int.genesyslab.com>
Giri,

  My comments in-line.

-----Original Message-----
From: Mandyam, Giridhar [mailto:mandyam@quicinc.com] 
Sent: Friday, November 30, 2012 12:19 PM
To: Jim Barnett; Timothy B. Terriberry; public-media-capture@w3.org
Subject: RE: revised recording proposal

Hi Jim,
This works.  Couple of comments/questions (and I know you have sent out a revision of the spec which I have yet to read):

a. What should be the changes to recording requirements (see http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html#rm2)?  Maybe something like "The UA must be able to return timesliced encoded data to the application during recording."
>> Yes, it would make sense to add that. 

b. Would the UA be in compliance if it returns timeSliced data in the form of a File object?  I don't believe the spec or requirements have to change as written, because the File object inherits from Blob (and the Blob may be backed by disk data regardless).
>> Hmm, good question.  If a File is a Blob, I suppose it might be.  What are the gc properties of Files?  I think of a "file" as something that persists on disk even if the app isn't referencing it.  You wouldn't want the file system to fill up when you were doing buffer-at-a-time processing.  On the other hand, I don't know if a File behaves the way I expect a "file" to work.

c. Should we provide a code example for the first use case below in the spec?  I'm still having trouble seeing how the necessary media processing could be achieved in real-time using the Recoding API.  
>> I don't think code samples belong in a use case doc, because the use case doc doesn't define the API.  We can see what sort of sample code people would like to add to the spec itself, though most examples that I have seen tend to be pretty simple, just illustrating the basic concepts.  

d. I think we should just be explicit and add an ASR scenario to the doc.  Although I personally think the right way to do this is on a "reliable"  PeerConnection, there is no point in rehashing that debate.  Maybe something along the lines of

Near-real time Speech Recognition

So-and-so is interacting with a turn-by-turn navigation website while in the car and requires "hands free" interaction with the website.  Before beginning to drive, he browses to the website and allows the website to capture data from his handset microphone.  He then speaks his destination, and his voice data is sent to a server which processes the captured voice and sends back map images tiles and associated metadata for rendering in the browser.

>> I'd be happy to add something like this.  Does anyone else have any comments?  

-Giri

-----Original Message-----
From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com] 
Sent: Friday, November 30, 2012 6:43 AM
To: Mandyam, Giridhar; Timothy B. Terriberry; public-media-capture@w3.org
Subject: RE: revised recording proposal

I think that the best thing to do would be to make 3.3 more explicit, and possibly to split it into two.  First I would change the title slightly to:

3.3 Find the ball assignment (media processing and recording)

Then I would change the two sentences I cited below to make it explicit that the processing is going on in real-ish time:

Alice is now ready; she enables the webcam, a video preview (to see herself and the ball with the box around it), changes the camera's resolution down to 320x200, starts a video capture along with her media processing code, and holds up the blue ball, moving it around.  As she moves the ball, her code processes each video frame, drawing the box around the ball. The video preview shows output of her code (namely herself with the box tracking the ball) so that she sees that it is working correctly.  After recording the output of her processing code for 30 seconds, Alice stops the recording and immediately uploads the recorded video to the assignment upload page using her class account.

Finally if we want to make it clear that 'simple' recording (i.e. without media processing) is supported, I suggest that we add a variation called "Recording with post-processing":

Alice decides to run her image-tracking code as a post-processing step.  She enables the webcam, a video preview (to see herself and the ball), changes the camera's resolution down to 320x200, starts a video recording, and holds up the blue ball, moving it around.  As she does this, the UA records the video stream of her and the ball.  After 30 seconds, she terminates the recording and saves the result to a file.  She then runs her image-processing software on the saved file, producing a new file that shows the box drawn around the moving ball.  She then previews the processed file to make sure it's correct, and uploads it to the assignment page using her class account.   

- Jim




-----Original Message-----
From: Mandyam, Giridhar [mailto:mandyam@quicinc.com] 
Sent: Thursday, November 29, 2012 5:20 PM
To: Jim Barnett; Timothy B. Terriberry; public-media-capture@w3.org
Subject: RE: revised recording proposal

Please do that - add the use case and appropriate requirements and then the group can have a chance to review.  As I mentioned before, the text as written is unclear as to whether Alice uploads the video immediately after recording.

If we are not able to leverage a stable use cases and requirements doc then we are aiming at a moving target.
-----Original Message-----
From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com] 
Sent: Thursday, November 29, 2012 1:42 PM
To: Mandyam, Giridhar; Timothy B. Terriberry; public-media-capture@w3.org
Subject: RE: revised recording proposal

Requirement 3.3 specifies real-time video processing during capture, as the following sentences make clear (notice that Alice uploads the video as soon as recording is over):

 "Alice is now ready; she enables the webcam, a video preview (to see herself), changes the camera's resolution down to 320x200, starts a video capture, and holds up the blue ball, moving it around to show that the image-tracking code is working. After recording for 30 seconds, Alice uploads the video to the assignment upload page using her class account."

And in any case, if that's not clear enough, we can always add another use case.  I've never heard anyone say that the use cases doc was finished.

- Jim


-----Original Message-----
From: Mandyam, Giridhar [mailto:mandyam@quicinc.com] 
Sent: Thursday, November 29, 2012 4:05 PM
To: Timothy B. Terriberry; public-media-capture@w3.org
Subject: RE: revised recording proposal

Please point out the requirements in http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html that state that media processing be built into the recording function.

-----Original Message-----
From: Timothy B. Terriberry [mailto:tterriberry@mozilla.com] 
Sent: Thursday, November 29, 2012 1:03 PM
To: public-media-capture@w3.org
Subject: Re: revised recording proposal

Mandyam, Giridhar wrote:
> I am sorry - I don't believe a recording API should be used to enable
 > real-time processing.  I certainly do not think it should be used for any

Well, this is the use case that Jim, Milan, and probably others are actually interested in (myself included), so I believe you may be in the minority in your belief. The current proposal suggests that both this use case and the file-at-once use case have a lot in common, and we'd be foolish not to take advantage of that.

 > audio stream processing for ASR.  This is what WebAudio is for, and we should  > work with the Audio WG if their current specification is unsuitable for what  > you believe is required for speech recognition.  But we have a call next week  > - maybe we can discuss this further during that time.

Encoding/decoding of audio belongs at the end-points of any processing graph, i.e., in MediaStreams, which are the domain of _this_ Task Force. 
To say nothing of the fact that a solution that only works for audio is pretty poor. But you can go venue shopping if you want. Let us know how that works out for you.
Received on Friday, 30 November 2012 18:37:52 UTC