- From: Bernard Aboba <Bernard.Aboba@microsoft.com>
- Date: Sat, 11 Jan 2014 00:14:33 +0000
- To: "public-orca@w3c.org" <public-orca@w3c.org>
Another set of questions relates to the handling of resolution end-to-end, starting from the local video source (e.g. camera or pre-recorded source), and and ending at the remote rendering (e.g. a <video> tag). Viewed end-to-end, an (oversimplified) flow of video in ORTC looks like this: |<-------------- Browser A -------------------------->| Source ---> MediaStreamTrack A ---> RTCRtpSender A --------+ |<------- Application A ---------->| | v ^ v Signalling channel Internet (media) v ^ | |<------- Application B ---------->| | <video> tag <-- MediaStreamTrack B <--- RTCRtpReceiver B --+ |<-------------Browser B --------------------------->| As suggested in the above diagram, on Browser A, a MediaStreamTrack obtained from a source (e.g. camera or pre-recorded video) is provided to an RTCRtpSender object which utilizes it to send video over the Internet with a resolution determined by RTCRtpEncodingParameters to an RTCRtpReceiver object on Browser B, which then provides a MediaStreamTrack for display within a <video> tag. Where Browser A is configured to send multiple streams such as for simulcast and/or scalable video coding, a Selective Forwarding Unit (SFU) would typically be present, so that the diagram would look like this: |<-------------- Browser A -------------------------->| Source ---> MediaStreamTrack A ---> RTCRtpSender A --------+ |<------- Application A ---------->| | v ^ v Signalling channel Internet (media) v ^ | SFU | v ^ v Signalling channel | v ^ | |<------- Application B ---------->| | <video> tag <-- MediaStreamTrack B <--- RTCRtpReceiver B --+ |<-------------Browser B --------------------------->| Note that in the above diagram, RTCRtpSender A might be configured to send multiple streams, such as for simulcast and/or scalable video coding, and the SFU will not necessarily pass all of those streams/layers on to Browser B. As a result, it is possible that the resolution and/or framerate received at B does not correspond to the maximum resolution and/or framerate sent by A. Robin Raymond has a nice blog post [4] that describes some of the issues that can be encountered at various stages of the above pipeline. Another useful blog post worth looking at relates to the functioning of constraints on MediaStreamTracks [2]. And of course there is the Media Capture and Streams document [1]. One of the questions raised is: Where there are mismatches between resolutions at various stages, how are the transformations carried out? The Media Capture and Streams document Section 5 describes the model of sources, sinks, constraints and states. As noted there, constraints apply to MediaStreamTracks, not sources. Sinks may apply transformations to the video received from sources. These transformations can include scaling up or down, as well as changing the aspect ratio. However, as Robin notes in his blog post, adjustment of the aspect ratio and associated distortion is undesirable. One of the ways to attempt to avoid this problem is to enable explicit discovery and configuration of resolution. However, as noted in [2], even though implementations often support a fixes set of resolutions corresponding to a small set of aspect ratios (e.g. 16:9, 9:16, 4:3, 3:4, etc.) there are privacy reasons why explicit discovery of supported camera resolutions is not enabled (e.g. fingerprinting). With current implementations of constraints not providing very predictable control over the resolution of a MediaStreamTrack [2], the resulting transformations occuring downstream in the pipeline can also be difficult to control. It is possible that this issue will be addressed by improvements in the behavior of constraints implementations and/or better error handling. Note that concerns about fingerprinting may not apply to capabilities of the RTCRtpSender and RTCRtpReceiver objects such as supported resolutions/framerates, etc. which could be considered to represent capabilities of the browser more than the underlying hardware. If we could avoid the privacy issue and retain explicit control over resolutions in RTCRtpSender and RTCRtpReceiver objects that would be very helpful. Robin's blog also has additional suggestions: "* The source must understand the video sink can change dimensions and aspect ratio anytime with a moments notice.... * The current properties include the active width and height of the video sink (or maximum width or height should the area be automatically adjustable). The area needs to be flagged as safe for letterboxing/pillarboxing or not. If the area is unable to accept letterbox or pillarbox then the image must ultimately be adjusted to fill the rendered output area. Under such a situation the source could and should pre-crop the image before sending knowing the final dimensions used." Since the properties of the video sink can change, events need to be provided so that when this occurs the characteristics of the source can be changed accordingly. References [1] Media Capture and Streams: http://dev.w3.org/2011/webrtc/editor/getusermedia.html [2] WebRTC Hacks: http://webrtchacks.com/how-to-figure-out-webrtc-camera-resolutions/ [3] Alvestrand, H., "Resolution Contraints in Web Real Time Communications", http://tools.ietf.org/html/draft-alvestrand-constraints-resolution [4] Raymond, R., "In the Trenches with RTCWEB and Real-time Video", http://blog.webrtc.is/?s=Resolution [5] Proposal for RtpSender/RtpReceiver split: http://dev.w3.org/2011/webrtc/editor/getusermedia.html#dfn-capabilities
Received on Saturday, 11 January 2014 00:15:04 UTC