- From: Rob Manson <roBman@mob-labs.com>
- Date: Wed, 02 Oct 2013 15:48:23 +1000
- To: public-media-capture@w3.org
Hi Martin, thanks for the feedback. > I don't think that this is the right way to present this information. Fair enough. Is it just the representation/diagram or the whole concept? Personally I think it would be hard to defend an argument that this overall growth of binary data streams isn't happening and that it's not changing the overall needs imposed on browser architectures. > This isn't just because of the problems that Harald found with the > Stream <=> MediaStream analogy. Harald had some good points about the names/language. But I didn't get the feeling he was questioning the overall discussion. If I misread this Harald, please call out? > The fundamental problem is that you are mixing primitives (MediaStream/Track) > with sources and sinks (like gUM, RTCPeerConnection, processing nodes, audio > and video tags). At least for media, there should be primitives on either side > of the processing and RTC nodes (RTCPeerConnection, web audio). And a > similar set of nodes would isolate byte streams from media streams, > the MediaRecorder being one example of this. Well, first of all let me say that this is an abstraction...and that all abstractions are lies 8) And of course I've drawn this from a specific web developers perspective to highlight a specific communication objective...which I think is creating a little impedence mismatch between us. But let me walk through your feedback example as I'm not sure I agree with your statement and perhaps I'm misunderstanding something. If you can help me understand that this would be great. You said I'm mixing primitives with sources and sinks. But if we just walk through the very top-most flow in the diagram I see this. On the left we have a "camera" or "screen" (I agree this is a "source"). And then next on the right we have the gUM API which allows us to access these "cameras" and "screens". And then gUM passes me a MediaStream object in the success callback. In this case it's a localStream. And in the PeerConnection example below that it may be a remoteStream. But either way it's a MediaStream[2]. Next on the right we have the processing pipelines we've been documenting and experimenting with[1]. For the Video/Canvas example we connect the MediaStream to a HTMLVideoElement .src (in your mental model this is the actual "sink") and then dump that onto a canvas using .drawImage(video,...). And then we extract that frame as an Image Data object (which is really just a wrapper for a Typed Array in .data) using .getImageData(). And underneath that is also an Array Buffer which we can use to minimise copying. And then this pipeline may choose to display this content out to the user in either this Canvas (as another "sink"?). Or into any other context (e.g. we often use that data to render WebGL overlays)...or may even just send the extracted features etc. over the network (would this be a "sink"?). But in the end, for me it's the "Display" that is the true "sink"...just like the "camera" or "screen" is the real "source" and not the gUM API or the MediaStream itself. First, do you see that any of this is incorrect? And second, can you suggest some way I could communicate this type of relationship more clearly? I admit that if you keep focused on the source/sink distinction[2] when looking at this diagram then on the right hand side you could feel it was not cleanly abstracted. But just ending at an <img>, <video> or <audio> tag is not really the end so for me it's not really the final "sink" (as described above). But from the other web developers and browser implementors I've discussed this with it seems this really clearly describes how I can flow data from a camera/screen (etc). through various required steps right through until I render the results on a "Display". And we can then profile performance and bottlenecks within each of the elements on the diagram. And this is all I have really been trying to capture/communicate as this is what we are really wrestling with. BTW: If you think the examples I have been presenting are pushing the use of binary streams of data and image processing too far...then you should have seen the paper I just watched at ISMAR13 where the front camera was used to track the user's gaze while the back camera was used to track the real world scene. This just doubled the size of these data streams and the compute resources required...but delivers a massive benefit in terms of User Experience for AR. And this is just scratching the surface. roBman [1] https://github.com/buildar/getting_started_with_webrtc/#image_processing_pipelinehtml [2] http://www.w3.org/TR/mediacapture-streams/#idl-def-NavigatorUserMediaSuccessCallback [3] http://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-states
Received on Wednesday, 2 October 2013 05:48:52 UTC