RE: recording proposal from Jim Barnett on 2012-10-08 (public-media-capture@w3.org from October 2012)

From: Jim Barnett <Jim.Barnett@genesyslab.com>
Date: Mon, 8 Oct 2012 06:17:10 -0700
To: "Harald Alvestrand" <harald@alvestrand.no>
Cc: <public-media-capture@w3.org>
Message-ID: <E17CAD772E76C742B645BD4DC602CD8106CDFDED@NAHALD.us.int.genesyslab.com>
Harald,

The main use case for media processing is 3.3 (drawing a box around the
ball for a video-processing class), but there are several other mentions
of media processing in section 5, specifically face recognition in 5.5.1
and speech recognition in 5.5.2, 5.6.3 and 5.10.   I think that speech
recognition will be one of the main use cases, at least in the call
center industry, and it requires getting audio separate from video.  

 

When the media capture group discussed the issue before (in the context
of the use cases document) we decided that recording and media
processing were the same use case, since they were both based on
"capturing the media in known format".  We decided to use the term
'recording' to cover both of these, so I went back and updated the doc
to use it everywhere.  (The difference between simple recording and
media processing is what you do with the media after you capture it. )

 

Even in the case of traditional recording, separate captures is not an
unusual use case, all least in the call center industry.  Today we
record voice only, but the products I'm familiar with record the two
participants in the call separately, and merge them at playback time.
They do this to allow off-line speech recognition/audio mining (where
you want to model the participants' voices separately.)

 

That said, Travis and I understand that we need to handle the case of
recording into a single file - we just haven't figured out a good way to
do it yet.  We had been discussing some sort of 'locking' API that would
tie Tracks together so they would be recorded as a unit.  But it didn't
seem like it would be easy to understand.  

 

The following also occurs to me.  The existing proposal really provides
two kinds of recording:  1) record till you're done and then give me
everything in one big Blob 2)  give me buffers of data as they're ready.
We could move 'record-till-you're-done' up onto MediaStream and leave
'give-me-buffers' on Track (we'd probably call it  something other than
'record' to avoid confusion).  The MediaStream API would handle the
simple case, while the Track API would handle more sophisticated cases
and real-time media processing.  

 

If we follow your suggestion and let the container format decide what it
will accept, the MediaStream record() function will be easy to define
since we won't have handle all the edge cases that  were bothering
Travis and me.  (For example, what do you do if Tracks are added and
removed while recording is going on?  We could say it's up to the
container format to decide if it can handle it or to raise an error
otherwise.)  On the other hand, unless we specify MTI container formats,
this approach doesn't provide much interoperability.  If we want to
avoid another round of the MTI wars,  maybe we could get away with
saying that the UA must support a container format that can
merge/synchronize a single video and single audio stream.  This would
then give a simple API for the simple use case (Stream contains one
audio and one video Track.  App calls record(), calls stopRecord(), gets
a Blob).  Anything involving multiple video or  audio streams would have
to be done with the Track-level API.  

 

-          Jim

 

 

 

From: Harald Alvestrand [mailto:harald@alvestrand.no] 
Sent: Sunday, October 07, 2012 5:30 PM
To: Jim Barnett
Cc: public-media-capture@w3.org
Subject: Re: recording proposal

 

On 10/07/2012 09:45 PM, Jim Barnett wrote:

	Harald,

	Travis and I started with record() attached to MediaStream, but
we ran into problems:

	1.      Nothing I see requires that a MediaStream have only a
single video track.  Should recording blend all the video tracks into
one incomprehensible mess?  (The <video> element has a concept of the
'active' or 'primary' video track, but MediaStream doesn't.)

I'm not sure this is a valid issue, or it may be container format
dependent.
If one tries to record something with multiple tracks onto a container
format that does not support them, failure is to be expected. But I
think some container formats can do this just fine (witness DVDs with
alternate camera angles). Subject matter expertise is needed here.
(I don't know the formats well enough.... I know it's possible to write
codecs in Javascript, but is writing a Matroska or DVI container
producer in Javascript something we expect people to do and get right?)



2.      Any form of media processing (e.g., inserting still images into
the video stream is one of the use cases, talking to an ASR system will
be another) requires access to the individual media streams.  

Yes, but that's not recording. (Which use case from the use cases
document were you thinking of?)

I can certainly argue for an API for individual stream access, but a) I
would not call it recording, and b) I would not call that satisfactory
for our recording use cases.



 

As far as I can tell, you need both combined recording and access to the
individual tracks.  If that's the case, it's better to start off with a
Track level API and figure out how to form a combined recording on top
of it, than to start off with a Stream level API and try to extract the
individual tracks from it.  

I think the common case is the full-stream recording, and the access to
individual-track data is the advanced case. I'd want to do the common
case first.
YMMV.



 

-          Jim

 

From: Harald Alvestrand [mailto:harald@alvestrand.no] 
Sent: Sunday, October 07, 2012 1:40 PM
To: public-media-capture@w3.org
Subject: Re: recording proposal

 

On 10/05/2012 03:55 PM, Jim Barnett wrote:

	partial interface MediaStreamTrack : EventTarget  {
	    void        record
<imap://hta@eikenes.alvestrand.no:143/fetch%3EUID%3E.INBOX.W3C.webrtc.me
dia%3E872#widl-record>  (optional timeSliceType
<imap://hta@eikenes.alvestrand.no:143/fetch%3EUID%3E.INBOX.W3C.webrtc.me
dia%3E872#idl-timeSliceType>  timeSlice);
	    void        stopRecording
<imap://hta@eikenes.alvestrand.no:143/fetch%3EUID%3E.INBOX.W3C.webrtc.me
dia%3E872#widl-stoprecording>  ();

Oops..... I got lost here.

A MediaStreamTrack contains either audio or video.

Recording, for any practical purpose, requires that one records audio
and video together - synchronized, and in some kind of container format.

This also means that the format produced by record() cannot possibly be
compatible with the MediaSource API, since that's a combined format.

I don't think this is what people expect of us.

(I see this is listed under "open issues", but I don't think we should
even start down this path with this fundamental limitation in place.)

                  Harald
Received on Monday, 8 October 2012 13:18:23 UTC