RE: Multiple codecs for video conferencing [was: RE: <device> proposal (for video conferencing, etc)]

Ian Hickson wrote:
> On Fri, 18 Dec 2009, Ennals, Robert wrote:

[snip]
> > * Hardware support: Some codecs may be supported in hardware. Even if my
> > browser has software support for Theora, I'm still going to strongly
> > prefer H264 if I have hardware decode for it.
> 
> Hardware decode (and encode) is a requirement of any codec we decide on as
> a common codec.

Do you mean support by current hardware, or support by hardware released after the spec is finalized? If the later, then apps will still want to Support codecs that older devices have hardware support for.

[snip]

> I agree that in the future we may want to change the common codec to a
> better one, but given the rate of development of suitable common codecs
> (one or two a decade), I think that solving this problem now is a
> little premature. So long as we make sure we _can_ solve it, we don't need to
> solve it yet.

Agreed. My point was to make sure that we don't bake in one particular codec in such a way that it makes it awkward to support newer codecs in the future.

If a new whizzbang codec is released tomorrow, then it would be a shame if videoconferencing apps were unable to support it because we had baked in a model where there was only one blessed codec.

Moreover, if a whizzbang codec is released tomorrow, then only some devices would support it in hardware, in which case we would need to make sure we could do codec negotiation to provide each device with the codec it supports - not entirely different to the situation we have with H264 and Theora now.

I guess another way of putting this is that, even if the search for a common codec was resolved tomorrow, I expect that it would eventually unresolve itself, once a new codec came along that some vendors really liked and other vendors didn't support.


> > * Video type: Some codecs might be specially designed for
> > videoconferencing (e.g. fancy codecs that build up a model of the user's
> > face), but not so great for movies.
> > * Special features: E.g. codecs that include information for gaze
> > correction, 3D, etc
> 
> Is the idea here that a particular user agent would implement this special
> codec, and that script would detect that all the clients on a connection
> were capable of handling this codec, and that they would then switch to
> this codec? It seems that if we're relying on user-agent-specific codecs
> in this manner, we don't really need to spec how it works, since it won't
> interoperate anyway...

Yep. That's what I was thinking. 

I don't think that we need to spec how content negotiation works, provided that we give a script the features it needs to do this.

E.g., when a script wants to get hold of streaming video, then we should make sure we allow the script to check what codecs the user-agent supports, and that it can choose which of the available codecs should be used to encode the streaming video.

I may have been misinterpreting the text you put in Section 2.1 which says "this will be pinned down to a specific codec in due course". 

> > There are also some scenarios where it makes sense to use several
> > formats in the same conference. E.g. imagine that you and I are
> > videoconferencing in super-HD using our 50-inch monitors, and are then
> > joined by a friend on his phone. Even if the phone supported our
> > super-HD codec, he wouldn't be able to keep up with the data. Either the
> > video would need to be transcoded on the server, or we would need to
> > encode our video at multiple resolutions so that the guy on the phone
> > could get a low-bitrate feed while we still kept our high-bitrate feeds.
> 
> That seems like a fine feature to support in the future, but is it
> really a high priority for v1?

I'm not sure. I'm not an expert on how these things are dealt with in practice.

I /think/ that one way people do this is that you record video using a codec that splits the video into multiple streams A, B, and C (simplified model). A contains enough information to render low quality video. B adds the extra information you need to get a bit more details. And C adds the information you need to get really good video. When a client is getting tight on bandwidth, it starts by dropping C, and then B, giving preference to A. 

Supporting such methods may just be a matter of making sure that whatever streaming method we specify in the API makes sure that the user-agent + codec have good enough access to each other that they can dynamically negotiate what data they are sending and receiving according to the available bandwidth. E.g. if we had an API in which the script grabbed chunks of data from a server and passed them on to the codec, then it wouldn't be possible for the codec to have  a backchannel to the server, and codecs like this would fail.

My hunch is that all we need to do is write a simple spec that allows the server and user-agent to do complex stuff if they need to.

-Rob

Received on Friday, 18 December 2009 02:54:28 UTC