Resolution handling end-to-end from Bernard Aboba on 2014-01-11 (public-orca@w3.org from January 2014)

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Date: Sat, 11 Jan 2014 00:14:33 +0000
To: "public-orca@w3c.org" <public-orca@w3c.org>
Message-ID: <290fbf9e2677427ca2cd44d49f623ea6@SN2PR03MB031.namprd03.prod.outlook.com>
Another set of questions relates to the handling of resolution end-to-end,
starting from the local video source (e.g. camera or pre-recorded source), and 
and ending at the remote rendering (e.g. a <video> tag). 
 
Viewed end-to-end, an (oversimplified) flow of video in ORTC 
looks like this:  

      |<-------------- Browser A -------------------------->|
     Source ---> MediaStreamTrack A ---> RTCRtpSender A --------+
             |<------- Application A ---------->|               |
                             v  ^                               v
                    Signalling channel                  Internet (media)
                             v  ^                               |
             |<------- Application B ---------->|               |
     <video> tag <-- MediaStreamTrack B <--- RTCRtpReceiver B --+
      |<-------------Browser B --------------------------->|

As suggested in the above diagram, on Browser A, a MediaStreamTrack 
obtained from a source (e.g. camera or pre-recorded video) is provided to 
an RTCRtpSender object which utilizes it to send video over the Internet 
with a resolution determined by RTCRtpEncodingParameters
to an RTCRtpReceiver object on Browser B, which then provides a
MediaStreamTrack for display within a <video> tag.  

Where Browser A is configured to send multiple streams such as
for simulcast and/or scalable video coding, a Selective Forwarding
Unit (SFU) would typically be present, so that the diagram would
look like this:

      |<-------------- Browser A -------------------------->|
     Source ---> MediaStreamTrack A ---> RTCRtpSender A --------+
             |<------- Application A ---------->|               |
                             v  ^                               v
                    Signalling channel                  Internet (media)
                             v  ^                               |
                            SFU                                 |
                             v  ^                               v
                    Signalling channel                          |
                             v  ^                               |
             |<------- Application B ---------->|               |
     <video> tag <-- MediaStreamTrack B <--- RTCRtpReceiver B --+
      |<-------------Browser B --------------------------->|

Note that in the above diagram, RTCRtpSender A might be configured
to send multiple streams, such as for simulcast and/or scalable
video coding, and the SFU will not necessarily pass all of those
streams/layers on to Browser B.  As a result, it is possible that
the resolution and/or framerate received at B does not correspond
to the maximum resolution and/or framerate sent by A. 

Robin Raymond has a nice blog post [4] that describes some of the issues
that can be encountered at various stages of the above pipeline. 
Another useful blog post worth looking at relates to the functioning 
of constraints on MediaStreamTracks [2].  And of course there is the
Media Capture and Streams document [1].   

One of  the questions raised is:

Where there are mismatches between resolutions at various stages,
how are the transformations carried out? 

The Media Capture and Streams document Section 5 describes the model of
sources, sinks, constraints and states.   As noted there, constraints apply
to MediaStreamTracks, not sources.   Sinks may apply transformations
to the video received from sources.  These transformations can include 
scaling up or down, as well as changing the aspect ratio. 

However, as Robin notes in his blog post, adjustment
of the aspect ratio and associated distortion is undesirable.  
One of the ways to attempt to avoid this problem is to enable
explicit discovery and configuration of resolution. 

However, as noted in [2], even though implementations often support
a fixes set of resolutions corresponding to a small set of
aspect ratios (e.g. 16:9, 9:16, 4:3, 3:4, etc.) there are privacy
reasons why explicit discovery of supported camera resolutions
is not enabled (e.g. fingerprinting).  With current implementations
of constraints not providing very predictable control over the
resolution of a MediaStreamTrack [2], the resulting transformations
occuring downstream in the pipeline can also be difficult to 
control.  It is possible that this issue will be addressed by
improvements in the behavior of constraints implementations and/or 
better error handling.  

Note that concerns about fingerprinting may not apply to 
capabilities of the RTCRtpSender and RTCRtpReceiver objects such
as supported resolutions/framerates, etc. which could be considered
to represent capabilities of the browser more than the
underlying hardware.  

If we could avoid the privacy issue and retain explicit control over
resolutions in RTCRtpSender and RTCRtpReceiver objects that
would be very helpful. 

Robin's blog also has additional suggestions:

"* The source must understand the video sink can change dimensions 
and aspect ratio anytime with a moments notice....

* The current properties include the active width and height of the video 
sink (or maximum width or height should the area be automatically adjustable). 
The area needs to be flagged as safe for letterboxing/pillarboxing or not. If the 
area is unable to accept letterbox or pillarbox then the image must ultimately 
be adjusted to fill the rendered output area. Under such a situation the source 
could and should pre-crop the image before sending knowing the final dimensions 
used."

Since the properties of the video sink can change, events need to be provided
so that when this occurs the characteristics of the source can be changed  
accordingly. 

References

[1] Media Capture and Streams: http://dev.w3.org/2011/webrtc/editor/getusermedia.html
[2] WebRTC Hacks: http://webrtchacks.com/how-to-figure-out-webrtc-camera-resolutions/
[3] Alvestrand, H., "Resolution Contraints in Web Real Time Communications", http://tools.ietf.org/html/draft-alvestrand-constraints-resolution
[4] Raymond, R., "In the Trenches with RTCWEB and Real-time Video", http://blog.webrtc.is/?s=Resolution
[5] Proposal for RtpSender/RtpReceiver split: http://dev.w3.org/2011/webrtc/editor/getusermedia.html#dfn-capabilities
Received on Saturday, 11 January 2014 00:15:04 UTC