- From: Harald Alvestrand <harald@alvestrand.no>
- Date: Fri, 15 Feb 2013 07:47:01 +0100
- To: public-media-capture@w3.org
- Message-ID: <511DD9E5.9020107@alvestrand.no>
I like this description very much. It fits the picture I have in my head, and it seems to express it more clearly than Travis' settings document did (although I think it's the same model - kudos for the harmony!). It also points out that the discussion on whether processing occurs at track head or track tail is meaningless - that's not an observable distinction. A few additional observations: - I see multiple tracks connected to one source, with different incompatible constraints, as being a matter of attaching to "virtual configurations" of that single source. Thus, a camera being asked to feed at resoution min 1024x768 and max 150x100 at the same time isn't necessarily overconstrained; the two tracks are connected to different virtual configurations - which in practice will be achieved by inserting a downscaling step somewhere between the source and the min-requesting sink. - I think upscaling should simply be unavailable inside the source/track/sink model. The "energy" concept (although I'd probably have used "phlogiston" instead, just to be cute) is nice, but seems like a lot of mental model for this special case. Sinks can do what sinks want to do, but the track configurations won't do it for them. As for making it simpler.... I don't like being trapped into only mandatory constraints, any more than I like being trapped into only optional ones (as I've argued with Rich before). The expressive power of having both seems worthwhile to me. I may be strange in my head. On 02/15/2013 02:05 AM, Martin Thomson wrote: > I was asked to more clearly elucidate my concerns with various > constraint proposals that were in the throes of being proposed during > the interim. Had I been sufficiently forewarned, perhaps we could > have avoided something of a lengthy discussion, but then we'd never > had come to this email, which I think you will find is highly > enlightening. Though the extent to which the enlightenment is > relevant to the work of this task force may vary. > > I initially reacted poorly to the thought that constraints on tracks > could imply that something would perform processing on those tracks. > That didn't fit the model I had...at the time. > > Here's the expanded model and how I think that constraints like aspect > ratio can fit that model. I've talked to Travis about this, and I > think we agree on the high level points. I believe that this is close > to the model he used to build the settings proposals. However, Travis > hasn't seen this email yet and my first draft was totally incoherent. > So... > > (tl;dr version: see the picture) > > __*Actors* > I think that we all agreed to this basic taxonomy: > > * *Sources*: Camera, microphone, file, blob, RTCPeerConnection, > processing API, … > * *Connector*: MediaStreamTrack* > * *Sinks*: <video> tag, <audio> tag, RTCPeerConnection, processing > API, recording API, … > > > Where cardinality is: > Source (1) .. (*) Connector (1) .. (*) Sink > > *__A Picture* > 1000 words worth of goodness. Apologies for the size. > > Inline images 2 > > > *__Summary of Conclusions* > (Thanks to Travis for these.) > > * "Selection" is a process that evaluates changes in connector > constraints and attempts to choose a source, plus an operating > mode on the source > * Selection has an associated "scope" (of operating modes) > o If a connector doesn't have a source, then its scope doesn't > apply (it selects from the empty set) > o If a connector is in the process of acquiring a source (from > getUserMedia), it's scope is all available [Note 1] operating > modes of all available sources > o If a connector has a source, it's scope is limited to the > operating modes of its source (only) > * Selection with multiple connectors has a specific policy, > depending on constraint, one of: > o "greatest common energy [Note 2]" (the operating mode selected > is the one that maps to the highest energy need of the > connected sinks) > o "pick one only" (the operating mode cannot be in multiple > contradictory states, e.g., the fill light can't be both on > and off) > o special - as defined by the constraint, some choices might not > be mutually exclusive (on and auto) > > Note 1: Sources (and operating modes) can be rendered unavailable by > having other tracks connected to them. Some operating modes are > mutually exclusive. For example, you can't have the fill light both > on and off based on different constraints; or, an encoding camera > might operate in a way that is not compatible with having multiple > users of that data, so the browser simply disables sharing for that > camera. This can also happen through explicit action, but we need to > define those special "constraints". > > Note 2: The concept of 'energy' needs further explanation. > > *__Energy* > Energy is just a word that has little meaning in this context. Energy > == information, but in a qualitative fashion only. The energy a > source produces is the amount of information the track conveys. A > higher resolution contains more energy, a higher sample rate contains > more energy. > > Processing can reduce energy safely, either by frame dropping, scaling > down, cropping, adding black bars, etc... > > In contrast, scaling up doesn't add energy, it just pads with bits > that contain no information. Thus, certain sinks will be (and should > be) unwilling to scale up. For example, RTCPeerConnection doesn't > want to send pointless extra stuff on the wire - it should be able to > learn of the actual energy of the track and refuse to use anything > that is scaled up, while scaling down as circumstances dictate. If > you want to scale up real-time video, scale it up on the receiver! > (BTW, don't infer new API requirements from this, this is purely > internal browser-stuff.) > > When multiple tracks are attached to the same source, each might set a > constraint on energy. Any constraint that limits energy is ignored - > for the purposes of selecting a track. Any constraint that imposes a > minimum level of energy is used to determine which source and > operating mode is selected. The highest energy constraint from any > track attached to a source is what determines its actual operating mode. > > For example, if track A wants 1080 lines minimum and track B wants 480 > lines minimum, track A wins and the camera produces 1080 lines. If > track B also wanted 480 lines /maximum/, then it will have to apply > some processing to get that. > > Constraints/settings that follow this rule include resolution (height > and/or width), frame or sample rate, bits per sample (if we could be > bothered with this). Minimum values are used to select sources or > operating modes, maximum values are sent to the processing box. > Cropping/letter-boxing are always processing instructions. > > *__Other Settings, Implications and Interactions > * > The other settings that we've seen (fillLightMode) directly affect > sources. These are easy: constraints can't specify conflicting values. > > However, this implies that the first track to apply a given setting > determines the operating mode for the source. As long as that track > lives, its setting is the one that wins and other tracks are either > unable to attach to the source, or unable to apply another setting. > > This is not ideal when settings interact. We might manage as long as > error feedback indicates that the error is due to there being other > constraints on other tracks. Or maybe we need to expose both the set > of all possible modes along with the current set of possible modes, > noting of course that the track that made the current setting could > change it at any time. That could result in an API that is a little > hard to explain properly. I don't have a good answer to this problem. > > In general, the model also implies that tracks don't report the actual > "shape" of a track. Tracks can report the settings that are currently > in effect and any optional settings that could be. But tracks cannot > say that the video flowing inside is this or that resolution - it > could change, and should be permitted to. It might be OK to provide > an indication of the current source operation mode, with a clear > warning that this is volatile and not under direct application control. > > *__Double Processing* > There are two places in the media path where processing logically > occurs. Implementations will naturally collapse those. For instance, > two lots of scaling can be reduced to a single scaling operation in > most cases. However, sometimes this will result in ugliness. > > The best example of ugly would be a 16:9 source that is sent through a > 16:10 constrained track to a 16:9 video tag. In that case, the > correct thing to do is to display a nice black frame around the video, > unless one of the aspect ratio changes cropped rather than > black-barred, in which case... > > *__Example* > We can apply this model to answer the important questions: > What happens if you constrain/set a track to width=640,height=480 for > a 1920x1080 camera source? > > If you consider the model, you reach two conclusions: > > * the source only needs to provide 480 lines worth of energy, though > it may provide more, it could just pipe out 1920x1080 video > * the data that is provided to the sink is scaled and cropped (or > letter-boxed) to 640x480 > > Adding another track (with no constraints) results in output of > 1920x1080, depending on what limits are implicitly applied by its sink. > > *__Make It Simpler, /Please/* > One major thing we could do to simplify things is to dump the idea of > mandatory vs. optional constraints. This model supports a lot of > flexibility without having "soft" constraints. Anything you don't > care that much about can be applied as settings after connecting the > track. > > I can actually see how this model could be considered *way *too > complicated as it is without optional constraints. It's already hard > enough to implement. More importantly, as a user of the API, it's > very difficult to understand the model sufficiently that you can > choose optional constraints that produce sensible, or even > predictable, outcomes. > > *__Render This All Moot* > By allowing applications to gain access to information about sources > and to connect sources to local playback sinks prior to gaining consent. > > (In the same vein: Harald did make a mildly convincing argument for > allowing this after consent for one stream was granted, based on the > premise that once you can grab an image using your camera, there isn't > much left that fingerprint has to do. That didn't account for very > tightly controlled sources, or tainted streams, however, so I'm not > sure we've reached that particular place just yet.) > > *__End Transmission*
Received on Friday, 15 February 2013 06:47:33 UTC