- From: Ian Hickson <ian@hixie.ch>
- Date: Sat, 3 Dec 2011 00:00:50 +0000 (UTC)
I include below, for posterity, some feedback to which I will not be replying, as it relates to the PeerConnection and media streams section of the specification which has since been moved to the WebRTC working group at the W3C. I encourage anyone who is interested in that particular topic to follow the aforementioned group. On Tue, 26 Jul 2011, Mark Callow wrote: > On 26/07/2011 14:30, Ian Hickson wrote: > > On Thu, 14 Jul 2011 04:09:40 +0530, Ian Hickson <ian at hixie.ch> wrote: > > > > > > > > > > > > Another question is flash. As far as I have seen, there seems > > > > > > to be no option to specify whether the camera needs to use > > > > > > flash or not. Is this decision left up to the device? (If > > > > > > someone is making an app which is just clicking a picture of > > > > > > the person, then it would be nice to have the camera use flash > > > > > > in low light conditions). > > > > > > > > > > getUserMedia() returns a video stream, so it wouldn't use a > > > > > flash. > > > > > > Wouldn't it make sense to have a provision for flash separately > > > then? I think a lot of apps would like just a picture instead of > > > video, and in those cases, flash would be required. Maybe a seperate > > > provision in the spec which defines whether to use flash, and if so, > > > for how many miliseconds. Is that doable? > > There is a lot more that could be done than simply triggering the flash. > See /The Frankencamera: An Experimental Platform for Computational > Photography/ <http://graphics.stanford.edu/papers/fcam/> and The FCAM > API <http://fcam.garage.maemo.org/>. On Tue, 26 Jul 2011, Tommy Widenflycht (?~[~O?~Z??~[~X?~[~X?~Z?) wrote: > On Tue, Jul 26, 2011 at 07:30, Ian Hickson <ian at hixie.ch> wrote: > > > > > > If you send two MediaStream objects constructed from the same > > > LocalMediaStream over a PeerConnection there needs to be a way to > > > separate them on the receiving side. > > > > What's the use case for sending the same feed twice? > > There's no proper use case as such but the spec allows this. > > > > > I also think it is a bit unfortunate that we now have a 'label' > > > property on the track objects that means something else than the > > > 'label' property on MediaStream, perhaps 'description' would be a > > > more suitable name for the former. > > > > In what sense do they mean different things? I don't understand the > > problem here. Can you elaborate? > > label on a MediaStream is a unique identifier, while the label on a > MediaStreamTrack is just a description like "Logitech Vision Pro", "Line > In" or "Built-in Mic". I too find this a bit odd. > > [...] > > If I may make an analogy to the real world: plumbing. > > Each fork of a MediaStream is a new joint in the pipe, my suggestion > introduces a tap at each joint. No matter how you open and close the tap > at the end (or middle); if any previous tap is closed there's nothing > coming through. The spec currently removes and add the entire pipe after > the changed joint. > > > > Also some follow-up questions regarding the new TrackLists: > > > > > > What should happen when a track fails? Should the entire stream > > > fail, the MSTrack silently be removed or the MSTrack disassociated > > > with the track (and thus becoming a do-nothing object)? > > > > What do you mean by "fails"? > > Yanking the USB cable to the camera for example. This should imho stop > the MS, not just silently send black video. > > > > What should happen when a stream with two or more video tracks is > > > associated to a <video> tag? Just render the first enabled one? > > > > Same as if you had a regular video file with multiple tracks. > > And that is? Sorry, this might be written down somewhere and I have > missed it. On Thu, 28 Jul 2011, Stefan H?kansson LK wrote: > >On Tue, Jul 26, 2011 at 07:30, Ian Hickson <ian at hixie.ch> wrote: > >> > >> > If you send two MediaStream objects constructed from the same > >> > LocalMediaStream over a PeerConnection there needs to be a way to > >> > separate them on the receiving side. > >> > >> What's the use case for sending the same feed twice? > > > >There's no proper use case as such but the spec allows this. > > The question is how serious a problem this is. If you want to fork, and > make both (all) versions available at the peer, would you not transmit > the full stream and fork at the receiving end for efficiency reasons? > And if you really want to fork at the sender, one way to separate them > is to use one PeerConnection per fork. On Tue, 2 Aug 2011, Per-Erik Brodin wrote: > On 2011-07-26 07:30, Ian Hickson wrote: > > On Tue, 19 Jul 2011, Per-Erik Brodin wrote: > > > > > > Perhaps now that there is no longer any relation to tracks on the > > > media elements we could also change Track to something else, maybe > > > Component. I have had people complaining to me that Track is not > > > really a good name here. > > > > I'm happy to change the name if there's a better one. I'm not sure > > Component is any better than Track though. > > OK, let's keep Track until someone comes up with a better name then. > > > > Good. Could we still keep audio and video in separate lists though? > > > It makes it easier to check the number of audio or video components > > > and you can avoid loops that have to check the kind for each > > > iteration if you only want to operate on one media type. > > > > Well in most (almost all?) cases, there'll be at most one audio track > > and at most one video track, which is why I didn't put them in > > separate lists. What use cases did you have in mind where there would > > be enough tracks that it would be better for them to be separate > > lists? > > Yes, you're right, but even with zero or one track it's more convenient > to have them separate because that way you can more easily check if the > stream contains any audio and/or video tracks and check the number of > tracks of each kind. I also think it will be problematic if we would > like to add another kind at a later stage if all tracks are in the same > list since people will make assumptions that audio and video are the > only kinds. > > > > I also think that it would be easier to construct new MediaStream > > > objects from individual components rather than temporarily disabling > > > the ones you do not want to copy to the new MediaStream object and > > > then re-enabling them again afterwards. > > > > Re-enabling them afterwards would re-include them in the copies, too. > > Why is this needed? If a new MediaStream object is constructed from > another MediaStream I think it would be simpler to just let that be a > clone of the stream with all tracks present (with the enabled/disabled > states independently set). > > > The main use case here is temporarily disabling a video or audio track > > in a video conference. I don't understand how your proposal would work > > for that. Can you elaborate? > > A new MediaStream object is created from the video track of a > LocalMediaStream to be used as self-view. The LocalMediaStream can then > be sent over PeerConnection and the video track disabled without > affecting the MediaStream being played back locally in the self-view. In > addition, my proposal opens up for additional use cases that require > combining tracks from different streams, such as recording a > conversation (a number of audio tracks from various streams, local and > remote combined to a single stream). > > > > It is also unclear to me what happens to a LocalMediaStream object > > > that is currently being consumed in that case. > > > > Not sure what you mean. Can you elaborate? > > I was under the impression that, if a stream of audio and video is being > sent to one peer and then another peer joins but only audio should be > sent, then video would have to be temporarily disabled in the first > stream in order to construct a new MediaStream object containing only > the audio track. Again, it would be simpler to construct a new > MediaStream object from just the audio track and send that. > > > > Why should the label the same as the parent on the newly constructed > > > MediaStream object? > > > > The label identifies the source of the media. It's the same source, > > so, same label. > > I agree, but usually you have more than one source in a MediaStream and > if you construct a new MediaStream from it which doesn't contain all of > the sources from the parent I don't think the label should be the same. > By the way, what happens if you call getUserMedia() twice and get the > same set of sources both times, do you get the same label then? What if > the user selects different sources the second time? > > > > If you send two MediaStream objects constructed from the same > > > LocalMediaStream over a PeerConnection there needs to be a way to > > > separate them on the receiving side. > > > > What's the use case for sending the same feed twice? > > If the labels are the same then that should indicate that it's > essentially the same stream and there should be no need to send it > twice. If the streams are not composed of the same underlying sources > then you may want to send them both and the labels should differ. > > > > I also think it is a bit unfortunate that we now have a 'label' > > > property on the track objects that means something else than the > > > 'label' property on MediaStream, perhaps 'description' would be a > > > more suitable name for the former. > > > > In what sense do they mean different things? I don't understand the > > problem here. Can you elaborate? > > As Tommy pointed out, label on MediaStream is an identifier for the > stream whereas label och MediaStreamTrack is a description of the > source. > > > > > The current design is just the result of needing to define what > > > > happens when you call getRecordedData() twice in a row. Could you > > > > elaborate on what API you think we should have? > > > > > > What I am thinking of is something similar to what was proposed in > > > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/030921.html > > > > That doesn't answer the question of what happens if you call stop() > > twice. > > Nothing will happen the second time since recording has already stopped. > > > (Also, having to call a method and hook an event so that you can read > > an attribute seems like a rather round-about way of getting data. Is > > calling a method with a callback not simpler?) > > When the event has been fired you can read the attribute whenever you > want to get the blob, how many times you want. I prefer that over having > stop() take a callback argument. > > > Quota doesn't seem particularly important here. It's not like you can > > really do lasting damage. It would just be a DOS attack, like creating > > a Web page with an infinite number of 10000x10000 canvases. We can > > just let the "hardware limitation" clause handle it. > > In a video blog recording application it would be nice to be able to > present to the user how much more can be recorded and not just handle it > as a hardware limitation, since that could mean dropping the entire > recording. > > > > I was not saying that it would not be possible to keep track of > > > which blob: URLs that point to blobs and which point to streams just > > > that we want to avoid doing that in the early stage of the media > > > engine selection. In my opinion a stream is quite the opposite of a > > > blob (unknown, perhaps infinite length vs. fixed length) so when > > > printing the URLs for debugging purposes it would also be much nicer > > > to have two different protocol schemes. If I remember correctly the > > > discussions leading up to the renaming of createBlobURL to > > > createObjectURL assumed that there would be stream: URLs. > > > > You wouldn't be able to remove that logic, since http: URLs would > > still have the same needs. You can have finite and infinite http: > > resources, just like you can have finite and infinite blob: resources. > > I don't really see the problem here. Indeed, with blob:, it's trivial > > to find out if the resource is finite or not; with http: you might not > > know until the whole finite resource is downloaded. > > > > If there is something I'm missing here please do let me know. > > The differentiation is not between finite and infinite resources but > rather between playback media resources and conversational media > resources. blob: and http: are both handled by the playback media engine > whereas stream: is handled by the conversational media engine. We would > like to be able to determine which engine to use by simply looking at > the URL. > > > > > > PeerConnection is an EventTarget but it still uses a callback > > > > > for the signaling messages and this mixture of events and > > > > > callbacks is a bit awkward in my opinion. If you would like to > > > > > change the function that handles signaling messages after > > > > > calling the constructor you would have to wrap a function call > > > > > inside the callback to the actual signal handling function, > > > > > instead of just (re-)setting an onsignal (or whatever) attribute > > > > > listener (the event could reuse the MessageEvent interface). > > > > > > > > When would you change the callback? > > > > > > If you would like to send the signaling messages peer-to-peer over > > > the data channel, once it is established. > > > > That seems like a disaster waiting to happen. The UDP data channel is > > unreliable, the signaling channel has to be reliable. Worse, the UDP > > data channel might go down at any second, and then the user agent > > would try to re-establish it using the signaling channel. > > You can provide a reliable channel on top of the unreliable channel and > monitor the PeerConnection state so that you know when to fall back to > server-relayed signaling. One reason to do this would be to improve the > signaling latency which can be of importance in applications that, for > example, trigger format renegotiation due to change in video display > size. > > > > > - It's easy to not register a callback, which makes no sense. > > > > There's literally never a use for create a PeerConnection without > > > > a signaling channel, as far as I can tell, so making it easier to > > > > create one without a callback than with seems like a bad design. > > > > > > For example, creating an EventSource without registering any > > > listener for incoming events equally does not make sense. > > > > Actually, it does. One operation mode for EventSource is to have > > events with different names, each triggering a different event > > listener. > > An EventSource without any event listener seems rather useless to me. > Even if you can assign multiple handlers for events with different > names, all those handlers could still be provided as arguments to the > constructor, right? That would ensure that nobody can create an > EventSource without registering at least one event listener. > > > > > > There is a potential problem in the exchange of SDPs in that > > > > > glare conditions can occur if both peers add streams > > > > > simultaneously, in which case there will be two different > > > > > outstanding offers that none of the peers are allowed to respond > > > > > to according to the SDP offer-answer model. Instead of using one > > > > > SDP session for all media as the specification suggests, we are > > > > > handling the offer-answer for each stream separately to avoid > > > > > such conditions. > > > > > > > > Why isn't this handled by the ICE role conflict processing rules? > > > > It seems like simultaneous ICE restarts would be trivially > > > > resolvable by just following the rules in the ICE spec. Am I > > > > missing something? > > > > > > This problem is not related to ICE but rather to the SDP > > > offer-answer model which is separate from the ICE processing. The > > > problem is that SDP offer-answer does not allow you to respond to an > > > offer when you have an outstanding offer for the same set of > > > streams. > > > > As far as I can tell, your interpretation is incorrect. This is > > entirely related to ICE, and ICE, as far as I can tell, defines this > > exact case in its role conflict resolution. > > > > The only time this can happen is if you have both ends do an ICE > > restart at exactly the same time. The offer from each ICE agent will > > be received by the other as if it was the response, and thus there > > will be a role conflict and the ICE role conflict resolution process > > will kick in. No? > > No, an ICE role conflict is not the same thing as a glare condition in > SDP offer-answer. On Wed, 27 Jul 2011, Rob Manson wrote: > > This is definitely not intended as criticism of any of the work going > on. It's intended as constructive feedback that hopefully provides > clarification on a key use case and it's supporting requirements. > > "Access to live/raw audio and video stream data from both local > and remote sources in a consistent way" > > I've spent quite a bit of time trying to follow a clear thread of > requirements/solutions that provide API access to raw stream data (e.g. > audio, video, etc.). But I'm a bit concerned this is falling in the gap > between the DAP and RTC WGs. If this is not the case then please point > me to the relevant docs and I'll happily get back in my box 8) > > Here's how the thread seems to flow at the moment based on public > documents. > > On the DAP page [1] the mission states: > "the Device APIs and Policy Working Group is to create > client-side APIs that enable the development of Web Applications > and Web Widgets that interact with devices services such as > Calendar, Contacts, Camera, etc" > > So it seems clear that this is the place to start. Further down that > page the "HTML Media Capture" and "Media Capture" APIs are listed. > > HTML Media Capture (camera/microphone interactions through HTML forms) > initially seems like a good candidate, however the intro in the latest > PWD [2] clearly states: > "Providing streaming access to these capabilities is outside of > the scope of this specification." > > Followed by a NOTE that states: > "The Working Group is investigating the opportunity to specify > streaming access via the proposed <device> element." > The link on the "proposed <device> element" [3] links to a "no > longer maintained" document that then redirects to the top level of the > whatwg "current work" page [4]. On that page the most relevant link is > the video conferencing and peer-to-peer communication section [5]. > More about that further below. > > So back to the DAP page to follow explore the other Media Capture API > (programmatic access to camera/microphone) [1] and it's latest PWD [6]. > > The abstract states: > > "This specification defines an Application Programming Interface > (API) that provides access to the audio, image and video capture > capabilities of the device." > > And the introduction states: > > "The Capture API defines a high-level interface for accessing > the microphone and camera of a hosting device. It completes the > HTML Form Based Media Capturing specification [HTMLMEDIACAPTURE] > with a programmatic access to start a parametrized capture > process." > > So it seems clear that this is not related to streams in any way either. > > The Notes column for this API on the DAP page [1] also states: > "Programmatic API that completes the form based approach > Need to check if still interest in this > How does it relate with the Web RTC Working Group?" > > Is there an updated position on this? > > So if you then head over to the WebRTC WG's charter [7] it states: > "...to define client-side APIs to enable Real-Time > Communications in Web browsers. > > These APIs should enable building applications that can be run > inside a browser, requiring no extra downloads or plugins, that > allow communication between parties using audio, video and > supplementary real-time communication, without having to use > intervening servers..." > So this is clearly focused upon peer-to-peer communication > "between" systems and the stream related access is naturally just > treated as an ancillary requirement. The scope section then states: > "Enabling real-time communications between Web browsers require > the following client-side technologies to be available: > > - API functions to explore device capabilities, e.g. camera, > microphone, speakers (currently in scope for the Device APIs & > Policy Working Group) > - API functions to capture media from local devices (camera and > microphone) (currently in scope for the Device APIs & Policy > Working Group) > - API functions for encoding and other processing of those media > streams, > - API functions for establishing direct peer-to-peer > connections, including firewall/NAT traversal > - API functions for decoding and processing (including echo > cancelling, stream synchronization and a number of other > functions) of those streams at the incoming end, > - Delivery to the user of those media streams via local screens > and audio output devices (partially covered with HTML5)" > > So this is where I really start to feel the gap growing. The DAP is > pointing to RTC saying not sure how if our Camera/Microphone APIs are > being superseded by the work in the RTC...and the RTC then points back > to say it will be relying on work in the DAP. However the RTCs > Recommended Track Deliverables list does include: > "Media Stream Functions, Audio Stream Functions and Video Stream > Functions" > > So then it's back to the whatwg MediaStream and LocalMediaStream current > work [8]. Following this through you end up back at the <audio> and > <video> media element with some brief discussion about media data [9]. > > Currently the only API that I'm aware of that allows live access to the > audio data through the <audio> tag is the relatively proprietary Mozilla > Audio Data API [10]. > > And while the video stream data can be accessed by rendering each frame > into a canvas 2d graphics context and then using getImageData to extract > and manipulate it from there [11], this seems more like a work around > than an elegantly designed solution. > > As I said above, this is not intended as a criticism of the work that > the DAP WG, WebRTC WG or WHATWG are doing. It's intended as > constructive feedback to highlight that the important use case of > "Access to live/raw audio and video stream data from both local and > remote sources" appears to be falling in the gaps between the groups. > > From my perspective this is a critical use case for many advanced web > apps that will help bring them in line with what's possible in the > native single vendor stack based apps at the moment (e.g. iPhone & > Android). And it's also critical for the advancement of web standards > based AR applications and other computer vision, hearing and signal > processing applications. > > I understand that a lot of these specifications I've covered are in very > formative stages and that requirements and PWDs are just being drafted > as I write. And that's exactly why I'm raising this as a single and > consolidated perspective that spans all these groups. I hope this goes > some way towards "Access to live/raw audio and video stream data from > both local and remote sources" being treated as an essential and core > use case that binds together the work of all these groups. With a clear > vision for this and a little consolidated work I think this will then > also open up a wide range of other app opportunities that we haven't > even thought of yet. But at the moment it really feels like this is > being treated as an assumed requirement and could end up as a poorly > formed second class bundle of semi-related API hooks. > > For this use case I'd really like these clear requirements to be > supported: > - access the raw stream data for both audio and video in similar ways > - access the raw stream data from both remote and local streams in > similar ways > - ability to inject new data or the transformed original data back into > streams and presented audio/video tags in a consistent way > - all of this be optimised for performance to meet the demands of live > signal processing > > PS: I've also cc'd in the mozilla dev list as I think this directly > relates to the current "booting to the web" thread [12] > > [1] http://www.w3.org/2009/dap/ > [2] http://www.w3.org/TR/2011/WD-html-media-capture-20110414/#introduction > [3] http://dev.w3.org/html5/html-device/ > [4] http://www.whatwg.org/specs/web-apps/current-work/complete/#devices > [5] http://www.whatwg.org/specs/web-apps/current-work/complete/#auto-toc-9 > [6] http://www.w3.org/TR/2010/WD-media-capture-api-20100928/ > [7] http://www.w3.org/2011/04/webrtc-charter.html > [8] http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html#mediastream > [9] http://www.whatwg.org/specs/web-apps/current-work/complete/the-iframe-element.html#media-data > [10] https://wiki.mozilla.org/Audio_Data_API > [11] https://developer.mozilla.org/En/Manipulating_video_using_canvas > [12] http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/7668a9d46a43e482# On Fri, 12 Aug 2011, Darin Fisher wrote: > > Putting implementation details aside, I agree that it is a bit > unfortunate to refer to a stream as a blob. So far, blobs have always > referred to static, fixed-size things. > > This function was originally named createBlobURL, but it was renamed > createObjectURL precisely because we imagined it being useful to pass > things that were not blobs to it. It seems reasonable that passing a > Foo object to createObjectURL might mint a different URL type than what > we would mint for a Bar object. > > It could also be the case that using blob: for referring to Blobs was > unfortunate. Maybe we do not really need separate URL schemes for > static, fixed size things and streams. On Mon, 15 Aug 2011, Harald Alvestrand wrote: > > Back in ancient history (late 90s, I think), when I wrote the first > version of stuff that eventually merged into RFC 4395, "New URI > schemes", I thought the set of operations an URI supported was pretty > important. > > In fact the text of RFC 4395 says: > > 2.4. Definition of Operations > > As part of the definition of how a URI identifies a resource, a URI > scheme definition SHOULD define the applicable set of operations that > may be performed on a resource using the URI as its identifier. A > model for this is HTTP; an HTTP resource can be operated on by GET, > POST, PUT, and a number of other operations available through the > HTTP protocol. The URI scheme definition should describe all > well-defined operations on the URI identifier, and what they are > supposed to do. > > Some URI schemes don't fit into the "information access" paradigm of > URIs. For example, "telnet" provides location information for > initiating a bi-directional data stream to a remote host; the only > operation defined is to initiate the connection. In any case, the > operations appropriate for a URI scheme should be documented. > > Note: It is perfectly valid to say that "no operation apart from GET > is defined for this URI". It is also valid to say that "there's only > one operation defined for this URI, and it's not very GET-like". The > important point is that what is defined on this scheme is described. > > So if that consideration is still of concern, the next question is of > course "are there operations that make sense for a stream that don't > make sense for (current uses of) blob:, or vice versa"? > > If "blob:" was intended to mean "reference to internal object, hand it > to APIs, the APIs will tell you if they don't like them", that > consideration may not be that important. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 2 December 2011 16:00:50 UTC