CHANGE: Provide JSONHints interface for media streams

What
--
I'd like to propose that we add an API for providing a "hints" object to 
local media streams as a JSON-style associative array.

Why
--
Media used for different types of applications may benefit significantly 
from different types of processing. For example, audio captured from 
consumer-grade handset or headset microphones should be subjected to 
automatic gain control (AGC), noise filtering (such as high-pass 
filtering to remove breath noise, etc.), enhancement (such as noise 
shaping to emphasize formants for better understandability), and should 
be transmitted using codecs or codec modes specifically designed for 
speech, using discontinuous transmission (DTX) covered by comfort noise 
generation (CNG) on the receiver side. On the other hand, music or other 
audio captured from studio-quality equipment (see, e.g., use case 4.2.9 
in the -05 use cases draft), pre-recorded files, or generated locally 
may be significantly impaired by such processing, may require different 
codecs or codec operating modes, and may be distorted by aggressive DTX 
thresholds. To be specific, the introduction of an adaptive high-pass 
filter in the SILK encoder (used in Opus) gave one of the largest 
call-quality improvements of any single feature, however when applied to 
music it can remove entire instruments. In order to support a wide range 
of applications, they need some way to signal to the browser what kind 
of processing is appropriate for a given media stream. A general hints 
mechanism gives them an extensible way to do that.

How
--
In the simplest form, I propose adding a JSONHints object as an argument 
to PeerConnection.addStream():

   void addStream(MediaStream stream, JSONHints hints);

The JSONHints object is a simple JSON-style associative array:

JSONHints {
   "audioApplication": "General | VOIP",
   "videoApplication": "General | HeadAndShoulders",
   /* etc., TBD */
}

A more complex approach would be to make the hints object an attribute 
of a MediaStream or MediaStreamTrack, as described in 
https://github.com/mozilla/rainbow/wiki/RTC_API_Proposal , but I'd like 
to see how far we can go with the simple approach first.

One advantage of this approach is that the hints can only be set when 
the stream is added, meaning browsers may set up their internal media 
processing pipeline based on them, and don't have to handle the 
complexity of the application being able to change them at any time. In 
particular, if changing a hint requires choosing a different codec, O/A 
might need to be re-run. In this approach, the only way to change the 
hints of a stream is to remove it and re-add it, which already implies 
the need to re-run O/A. Compared with the above-mentioned proposal, 
which uses an onTypeChanged callback which can possibly generate a new 
MediaStreamTrack, I think this approach is much simpler, and easier to 
use, since in 99.9% of cases you will want to set the hints once, and 
never change them.

Another advantage is that, by making this an argument to addStream(), it 
isn't as easy to lose track of the hints associated with a MediaStream. 
For example, if the hints were an attribute set on the MediaStream 
returned by getUserMedia(), and then it was cloned into a new 
MediaStream object to select a subset of the tracks, or run through a 
ProcessedMediaStream to apply some effect (possibly combining it with 
other MediaStreams with different hints), the semantics for propagating 
these hints would need to be defined. Trying to have this happen 
"automatically" is a good way to get it wrong (especially in the 
ProcessedMediaStream case). By asking for the hints exactly where 
they're going to be used, that kind of complexity is avoided.

The disadvantage is that there's no way to query the hints after you've 
set them, but an application can simply hold on to the object it passed 
in on its own.

Another place that might benefit from the addition of a JSONHints 
argument is MediaStream.record(). Whether this is sufficient, or a more 
explicit API is needed for things like codec selection, resolution, 
framerate, etc., however, is a different question. I think hints are 
most appropriate for settings which the browser should be free to ignore 
if it doesn't want to implement them.

Received on Tuesday, 4 October 2011 10:50:24 UTC