Re: Multiple codecs for video conferencing [was: RE: <device> proposal (for video conferencing, etc)] from Ian Hickson on 2009-12-18 (public-device-apis@w3.org from December 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 18 Dec 2009 02:12:58 +0000 (UTC)
To: "Ennals, Robert" <robert.ennals@intel.com>
Cc: "Tran, Dzung D" <dzung.d.tran@intel.com>, "public-device-apis@w3.org" <public-device-apis@w3.org>
Message-ID: <Pine.LNX.4.62.0912180156490.15825@hixie.dreamhostps.com>

On Fri, 18 Dec 2009, Ennals, Robert wrote:
> 
> If all we care about is whether one client can communicate with another 
> then yes, all we need is one codec that everyone can agree to implement.
> 
> However I think that there are other reasons why, even if everyone could 
> agree to implement one codec, we would still end up wanting to support 
> multiple codecs.
> 
> In particular:
> 
> * Hardware support: Some codecs may be supported in hardware. Even if my 
> browser has software support for Theora, I'm still going to strongly 
> prefer H264 if I have hardware decode for it.

Hardware decode (and encode) is a requirement of any codec we decide on as 
a common codec.

> * Compression ratio: Some codecs compress better than other codecs. In 
> the future we are likely to see codecs that compress significantly 
> better than the codecs that are currently popular.
> * New codecs: People will release better codecs in the future. We don't 
> want to force everyone to stick with H264/Theora/whatever if something 
> better has come along.

I agree that in the future we may want to change the common codec to a 
better one, but given the rate of development of suitable common codecs 
(one or two a decade), I think that solving this problem now is a little 
premature. So long as we make sure we _can_ solve it, we don't need to 
solve it yet.

> * Video type: Some codecs might be specially designed for 
> videoconferencing (e.g. fancy codecs that build up a model of the user's 
> face), but not so great for movies.
> * Special features: E.g. codecs that include information for gaze 
> correction, 3D, etc

Is the idea here that a particular user agent would implement this special 
codec, and that script would detect that all the clients on a connection 
were capable of handling this codec, and that they would then switch to 
this codec? It seems that if we're relying on user-agent-specific codecs 
in this manner, we don't really need to spec how it works, since it won't 
interoperate anyway...

> There are also some scenarios where it makes sense to use several 
> formats in the same conference. E.g. imagine that you and I are 
> videoconferencing in super-HD using our 50-inch monitors, and are then 
> joined by a friend on his phone. Even if the phone supported our 
> super-HD codec, he wouldn't be able to keep up with the data. Either the 
> video would need to be transcoded on the server, or we would need to 
> encode our video at multiple resolutions so that the guy on the phone 
> could get a low-bitrate feed while we still kept our high-bitrate feeds.

That seems like a fine feature to support in the future, but is it really 
a high priority for v1?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 18 December 2009 02:13:42 UTC