Re: Common goals and framework from Chris Grigg on 2010-06-17 (public-xg-audio@w3.org from June 2010)

From: Chris Grigg <chris@chrisgrigg.org>
Date: Thu, 17 Jun 2010 12:30:01 -0700
To: public-xg-audio@w3.org
Message-Id: <CDD0A96A-8DE7-4D0A-A194-5820AD54A5F6@chrisgrigg.org>
Dave (Hi, Dave!) is right that avoiding breakup while at the same time spanning a wide range of device compute power is both essential and tricky.  Looking back into history, there are a number of classes of strategy that have been used that we could potentially use here.

They include at least:

1) Adaptive sample rate (when needed, dynamically slow it down to provide more processor cycles per audio sample)

2) Adaptive quality (when needed, dynamically switch to simpler processing/synthesis algorithms to reduce number of processor cycles per audio sample)

3) Voice prioritization (render as many of the most important voices as there is processor power to support, dynamically muting the rest)

4) Adaptive content (when needed, content creator determines which blocks of voices don't get rendered)

5) Content profiles (define more than one device capability layer; content developers must choose which profiles to statically support and get guaranteed performance within each profile)

6) Do not adapt, instead pick a baseline and have all content developers write to that level

The choice is pretty fraught because each of these strategies brings complications, some of which are really significant, and/or missed opportunities.  

To briefly characterize each one: 

1) & 2) I don't know of any existing widely deployed mature music/audio engine implementations that do these, as engines necessarily tend to be highly optimized and that has precluded designing that kind of parametric model in; not to assume that new implementations necessarily couldn't do this.  

3) has a long and relatively successful history in game audio, but it does complicate content authoring somewhat; implementation is simplest when all voices have the same or similar structure, as opposed to a fully configurable graph.  

4) is used (for example) in the Scalable Polyphony MIDI standard ("SP-MIDI") and the Mobile DLS synth engine, which give the content developer greater defense against bad voice stealing artifacts; it works, but also complicates content development.

5) is sensitive to getting the profile definitions right, as this kind of slicing may tend to lead to detrimental fragmentation; it also complicates content development.

6) is [together with 1) & 2)] the simplest for developers, but also the most limiting since it's a lowest-common-denominator approach and therefore doesn't take advantage of more power when it's available.

There are probably more strategies worth reviewing here, but maybe this is a start.

	-- Chris G.


On 2010Jun 17, at 10:00 a, Chris Marrin wrote:
> 
> On Jun 17, 2010, at 7:32 AM, Yves Raimond wrote:
> 
>> On 17/06/10 15:19, David Singer wrote:
>>> My  worry tis that audio processing could easily be defined in such a way that it is a synchronous task which has to 'keep up or fail spectacularly'.  The trouble is that the CPU available both varies widely by device (as you list) and also, on many devices, varies widely over time (CPU competition).
>>> 
>>> I believe that the tricky task is to design a system that degrades gracefully when not all the desired CPU is available.  Events (e.g. mouseMoved) do that by dropping the event frequency.  Animations/transitions do that by dropping the frame rate.  How will sound processing do that?
>>> 
>>> 
>> Another option (although maybe a bit radical) would be to go for a fully declarative language, and leave it to the client to do the best it can... Maybe similar to CSound?
> 
> 
> That's really what Chris' design is. In its current incarnation it is a graph constructed via API calls rather than having a declarative incarnation as XML elements. I believe Chris was going in the direction of exposing his nodes as Elements, but in our discussions with him we agreed that a declarative form was not useful, so a programmatic approach to building the graph was sufficient. But I might be misremembering somewhat.
> 
> I think Dave's statements are very true and should be added to our list of design criteria:
> 
> n+1) Design should gracefully degrade to allow audio processing under resource constrained conditions without dropping audio frames.
> 
> I think this criteria applies for either native or JavaScript processing. As Dave mentions there are fairly simple techniques for dealing with resource constraints with animation and video processing. It's much harder for audio because a dropped frame is extremely noticeable and unacceptable. Reducing sample rate while under load is an interesting alternative. But in that case we'd definitely need a filter chain model where native code could get involved at the connections between the filters to reduce the rates on the fly. Or something like that...
> 
> -----
> ~Chris
> cmarrin@apple.com
> 
> 
> 
> 
>
Received on Thursday, 17 June 2010 19:30:32 UTC