- From: Randell Jesup <randell-ietf@jesup.org>
- Date: Sat, 06 Apr 2013 23:45:32 -0400
- To: public-media-capture@w3.org
- Message-ID: <5160EBDC.7070404@jesup.org>
We've been around this drain before... see http://www.ietf.org/mail-archive/web/rtcweb/current/msg04624.html and followups. Perhaps we can make it to the exit this time.... :-) On 4/4/2013 7:01 AM, Stefan HÃ¥kansson LK wrote: > On 4/3/13 5:29 PM, Randell Jesup wrote: >> On 3/25/2013 5:55 PM, Martin Thomson wrote: >>> I think that it's fair to say that the current state of the >>> MediaStream[Track] states is pretty dire. At least from a usability >>> perspective. >> >> I generally agree with the approach here. I also agree that the >> MediaStream should be an explicit rollup of the states of the tracks >> (just as I feel we need a stream.stop() in addition to track.stop(), >> even though you can build one in JS yourself). >> >> One thing I really want to see described (doesn't have to be in the >> spec) is how an application can provide a "Hold" operation where live >> video is replaced with a video slate and/or pre-recorded >> animation/audio, and do it without an offer/answer exchange. The >> MediaStream Processing API would have made this fairly simple, but since >> we don't have that, we need to define something. WebAudio (if/when >> implemented) may help (with some pain) for the audio side, but doesn't >> help us with video. >> >> The real blocker here is giving a way for a MediaStreamTrack (in a >> MediaStream that's already AddStream()ed to a PeerConnection) to get a >> different source. Currently, the only easy way I can see it is very >> kludgy and probably higher overhead/delay than we'd like: >> >> video_hold.src = my_hold_animation.ogg; >> elevator_music = video_hold.captureStreamUntilEnded(); > > What is this? An undocumented feature of the media element? It's a (very useful) API originally part of the Media Processing API (you can look at the last draft of that). Takes the output (decoded audio and video) of a media element and uses it to source a MediaStream. We absolutely need it if we want any way to feed an encoded/saved stream into a PeerConnection. (We can record messages, but we can't play them back except maybe through a canvas (ugh)). We can't even have a "Sorry, I'm not here right now, please leave a message" without something like this. Se we need it (or the equivalent) for all sorts of reasons, not just "Hold"/Mute/etc. Firefox has had this since our MediaStream code landed most of a year ago. > >> The only alternative I see in the current spec might be to have two >> tracks always, and disable the live track and enable the Hold/slate >> track - but that would still cause a negotiationneeded I believe before >> it took affect. > > I will not argue that being able to switch source for a > MediaStreamTrack is useless, because I think it could be useful. But > switching source could very well also lead to that a renegotiation is > needed (I take this from what Cullen said in Boston: if the current > source encodes with codec A but the other with codec B you'd have to > renegotiate anyway). True, but not relevant. If you need to negotiate, you do so. And I'll note that MediaStreams holding already-encoded data is I believe under (or not) specified, other than people waving their hands and saying "we'd like to have a camera that encodes and not have to decode and re-encode it". It explicitly doesn't define a canonical representation - but it also doesn't specify anything related to that or how things hooked up to MediaStreams can deal with incoming data, which causes confusion in the current question. The closest it comes to defining this behavior is (in defining "source"): "A source can be a physical webcam, microphone, local video or audio file from the user's hard drive, network resource, or static image." Also "When a ||MediaStream| <http://dev.w3.org/2011/webrtc/editor/getusermedia.html#idl-def-MediaStream>| object is being generated from a local file (as opposed to a live audio/video source), the user agent /SHOULD/ stream the data from the file in real time, not all at once. ", and "User agents /MAY/ allow users to use any media source, including pre-recorded media files" and in discussion of implementation: "in addition to the ability to substitute a video or audio source with local files and other media. A file picker may be used to provide this functionality to the user." That's about all that's said about sources other than cameras. We really need to flesh this out - and decide what actually should be required, or decide what is optional. video_element.captureStreamUntilEnded() has an advantage of making it possible for anything that can be a video source into a source for a MediaStream (Media Source API, streaming video, etc). > > If I understand the use-case you describe here correctly, you'd like > to switch to Muzak being played out at the remote end when you mute > locally. > > The straghtforward way of meeting this use-case with the current set > of APIs would be: > > 1. As the application loads, download the Muzak source file > > 2. Use one video element to render the live a/v stream from the remote > party, and a separate (muted) audio element that has Muzak as source - > set up in a "loop" fashion > > 3. When the user at the remote end presses "mute me" in the > application, have the app a) disable the audio track and b)send a > "play Muzak" signal to the peer > > 4. When "change to Muzak" is received, unmute the Muzak audio element > (no need to mute the video element as silence is being played) > > 5. Same goes for unmute -> signal "stop play Muzak" -> mute the Muzak > audio element. > > There are many other options as well. Ok, an application could do that, but that moves the Muzak from the source to the target - that doesn't work if the target is a different app, or the target is behind a gateway, or if what you want to play is local to the sender. I'll note I've done exactly this in the past to indicate Video Mute (show a local slate to say the other side Muted - that way users don't go "it's black - something's broken with the video!" (and they do)). However, I had to move to a source-side mute once we had to deal with any type of non-homogenous network. And some apps will do it this way. But there are plenty of reasons to believe people will want to change the source of a MediaStream (or MediaStreamTrack(s)). Another random/silly-but-real example: if you want to do Reindeer Antlers on someone's video image, you'll need to change from direct getUserMedia()->PeerConnection to add a canvas inbetween (or some such), and that means re-routing the data on the fly, unless you set up the entire pipeline from the start "just incase" it was needed (and, BTW, doing so would blow any attempt to keep the data encoded - see above) - or you'd have to re-negotiate on add and remove - and this gets more painful, since you'd need to drop an old stream, and add a new one on each transition (building up m-lines) or disable the old one and add/enable the new one maybe. Another example: I want to (in my app) to be able to smoothly and quickly switch between front and back cameras. I don't want to have a offer/answer exchange to do this. Or switch mics. > >> >> p.s. For Hold and Mute in a video context, I rarely if ever want "black" >> as an application. I may want to send information about Hold/Mute >> states to the receiver so the receiver can do something smart with the >> UI, but at a minimum I want to be able to provide a >> slate/audio/music/Muzak-version-of-Devo's-Whip-It (yes, I heard that in >> a Jack In The Box...) > > There is also the "poster" possibility with the video element (i.e. an > image to be played in absence of video). Right; that's what I refer to as a 'slate'. Sorry, video/cable business dialect. :-) -- Randell Jesup randell-ietf@jesup.org
Received on Sunday, 7 April 2013 03:47:52 UTC