- From: Jim Barnett <Jim.Barnett@genesyslab.com>
- Date: Mon, 8 Oct 2012 06:17:10 -0700
- To: "Harald Alvestrand" <harald@alvestrand.no>
- Cc: <public-media-capture@w3.org>
- Message-ID: <E17CAD772E76C742B645BD4DC602CD8106CDFDED@NAHALD.us.int.genesyslab.com>
Harald, The main use case for media processing is 3.3 (drawing a box around the ball for a video-processing class), but there are several other mentions of media processing in section 5, specifically face recognition in 5.5.1 and speech recognition in 5.5.2, 5.6.3 and 5.10. I think that speech recognition will be one of the main use cases, at least in the call center industry, and it requires getting audio separate from video. When the media capture group discussed the issue before (in the context of the use cases document) we decided that recording and media processing were the same use case, since they were both based on "capturing the media in known format". We decided to use the term 'recording' to cover both of these, so I went back and updated the doc to use it everywhere. (The difference between simple recording and media processing is what you do with the media after you capture it. ) Even in the case of traditional recording, separate captures is not an unusual use case, all least in the call center industry. Today we record voice only, but the products I'm familiar with record the two participants in the call separately, and merge them at playback time. They do this to allow off-line speech recognition/audio mining (where you want to model the participants' voices separately.) That said, Travis and I understand that we need to handle the case of recording into a single file - we just haven't figured out a good way to do it yet. We had been discussing some sort of 'locking' API that would tie Tracks together so they would be recorded as a unit. But it didn't seem like it would be easy to understand. The following also occurs to me. The existing proposal really provides two kinds of recording: 1) record till you're done and then give me everything in one big Blob 2) give me buffers of data as they're ready. We could move 'record-till-you're-done' up onto MediaStream and leave 'give-me-buffers' on Track (we'd probably call it something other than 'record' to avoid confusion). The MediaStream API would handle the simple case, while the Track API would handle more sophisticated cases and real-time media processing. If we follow your suggestion and let the container format decide what it will accept, the MediaStream record() function will be easy to define since we won't have handle all the edge cases that were bothering Travis and me. (For example, what do you do if Tracks are added and removed while recording is going on? We could say it's up to the container format to decide if it can handle it or to raise an error otherwise.) On the other hand, unless we specify MTI container formats, this approach doesn't provide much interoperability. If we want to avoid another round of the MTI wars, maybe we could get away with saying that the UA must support a container format that can merge/synchronize a single video and single audio stream. This would then give a simple API for the simple use case (Stream contains one audio and one video Track. App calls record(), calls stopRecord(), gets a Blob). Anything involving multiple video or audio streams would have to be done with the Track-level API. - Jim From: Harald Alvestrand [mailto:harald@alvestrand.no] Sent: Sunday, October 07, 2012 5:30 PM To: Jim Barnett Cc: public-media-capture@w3.org Subject: Re: recording proposal On 10/07/2012 09:45 PM, Jim Barnett wrote: Harald, Travis and I started with record() attached to MediaStream, but we ran into problems: 1. Nothing I see requires that a MediaStream have only a single video track. Should recording blend all the video tracks into one incomprehensible mess? (The <video> element has a concept of the 'active' or 'primary' video track, but MediaStream doesn't.) I'm not sure this is a valid issue, or it may be container format dependent. If one tries to record something with multiple tracks onto a container format that does not support them, failure is to be expected. But I think some container formats can do this just fine (witness DVDs with alternate camera angles). Subject matter expertise is needed here. (I don't know the formats well enough.... I know it's possible to write codecs in Javascript, but is writing a Matroska or DVI container producer in Javascript something we expect people to do and get right?) 2. Any form of media processing (e.g., inserting still images into the video stream is one of the use cases, talking to an ASR system will be another) requires access to the individual media streams. Yes, but that's not recording. (Which use case from the use cases document were you thinking of?) I can certainly argue for an API for individual stream access, but a) I would not call it recording, and b) I would not call that satisfactory for our recording use cases. As far as I can tell, you need both combined recording and access to the individual tracks. If that's the case, it's better to start off with a Track level API and figure out how to form a combined recording on top of it, than to start off with a Stream level API and try to extract the individual tracks from it. I think the common case is the full-stream recording, and the access to individual-track data is the advanced case. I'd want to do the common case first. YMMV. - Jim From: Harald Alvestrand [mailto:harald@alvestrand.no] Sent: Sunday, October 07, 2012 1:40 PM To: public-media-capture@w3.org Subject: Re: recording proposal On 10/05/2012 03:55 PM, Jim Barnett wrote: partial interface MediaStreamTrack : EventTarget { void record <imap://hta@eikenes.alvestrand.no:143/fetch%3EUID%3E.INBOX.W3C.webrtc.me dia%3E872#widl-record> (optional timeSliceType <imap://hta@eikenes.alvestrand.no:143/fetch%3EUID%3E.INBOX.W3C.webrtc.me dia%3E872#idl-timeSliceType> timeSlice); void stopRecording <imap://hta@eikenes.alvestrand.no:143/fetch%3EUID%3E.INBOX.W3C.webrtc.me dia%3E872#widl-stoprecording> (); Oops..... I got lost here. A MediaStreamTrack contains either audio or video. Recording, for any practical purpose, requires that one records audio and video together - synchronized, and in some kind of container format. This also means that the format produced by record() cannot possibly be compatible with the MediaSource API, since that's a combined format. I don't think this is what people expect of us. (I see this is listed under "open issues", but I don't think we should even start down this path with this fundamental limitation in place.) Harald
Received on Monday, 8 October 2012 13:18:23 UTC