Re: Newbie questions about web audio working group specs from Chris Rogers on 2012-02-01 (public-audio@w3.org from January to March 2012)

From: Chris Rogers <crogers@google.com>
Date: Wed, 1 Feb 2012 11:06:52 -0800
To: Samuel Goldszmidt <samuel.goldszmidt@ircam.fr>
Cc: public-audio@w3.org
Message-ID: <CA+EzO0mRKTQAeSKm+ZR1btRsdRN1pd+63YarFNSYMfx6Bqgf8g@mail.gmail.com>
On Wed, Feb 1, 2012 at 7:24 AM, Samuel Goldszmidt <
samuel.goldszmidt@ircam.fr> wrote:

>  Thank you both for your answers.
>
> I read the links you gave me and I'd like to better understand the
> differences between your two APIs.
> Don't hesitate to correct me when I'm wrong.
>
> What I understand here, is that Audio Web API constructs a specific audio
> routing graph, with audio node that do processes. It seems that MediaStream
> API don't want to deal with an other graph, based on the fact that we have
> allready DOM real-time media stream graph in HTML Stream Spec.
> - Is there no 'AudioContext' in the MediaStream API ? (so it's the
> element.src value ?)
> - Is there, or where are Javascript Workers in Audio Web API ?
>

Jussi Kalliokoski has asked about adding web workers to the
JavaScriptAudioNode on this list a little while back.  We also discussed
this at the W3C face-to-face meeting very recently and agreed that this
should be added to the JavaScriptAudioNode spec.  It will amount to a very
small API change, so I'll try to update the specification document soon.  I
want to make clear that simply moving JavaScript to a worker thread doesn't
solve every problem.  Garbage collection stalls are still very much an
issue, and these are quite irksome to deal with in a real-time system,
where we would like to achieve low latency without glitches or stutters.


> - It seems to me that the Mozilla audio specification here
> https://wiki.mozilla.org/Audio_Data_API, and inside examples, says more
> or less : create a simili Javascript AudioGraph on your own inside a
> worker, and all will be fine, no ?
>

I'll let Robert answer that one...


>
> Chris, in the Audio Web API, you have some kind of predefined effects and
> also a way to define custom processings in Javascript (this could also be
> done at low level with C implementations, and may be a way to load this 'C
> audio plugin' in browser ?).
>

It would be great to be able to load custom C/C++ plugins (like VST or
AudioUnits), where a single AudioNode corresponds to a loaded code module.
 But there are very serious security implications with this idea, so
unfortunately it's not so simple (using either my or Robert's approach).


> What I understand is that these JavaScriptAudioNode could be for custom
> spatialization tools, convolution engine (!), personnal low pass filter,
> ... but with problems due to JavaScript performance (
> http://www.w3.org/TR/webaudio/#JavaScriptPerformance-section).
>

Yes, performance is the key issue here.  For example, in the case of
convolution (important for games and music) and many spatialized sources,
the performance issues of JavaScript (in the main thread or in a worker)
become very apparent.  The native multi-threaded implementation in the Web
Audio API is able to leverage multiple cores of the machine *and* harness
SIMD instructions.  In quite plain terms, a convolution of this type is not
even close to being feasible if processing directly in JavaScript.  Here is
some background reference on the implementation of the convolution engine:
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/convolution.html

In our current WebKit implementation, we support multiple FFT backends.
 For example on Mac OS X, we use the highly-optimized native library in
vecLib.framework:
http://svn.webkit.org/repository/webkit/trunk/Source/WebCore/platform/audio/mac/FFTFrameMac.cpp

Intel is currently working on a patch for their world-class IPP performance
library:
https://bugs.webkit.org/show_bug.cgi?id=75522

Performance issues are even more serious when dealing with lower-end
hardware such as tablet devices.



> In Audio Web API, it seems that the developer is responsible for audio
> glitch. Like in Logic Audio for example: if the computer is CPU is too high
> (you use too much convulotion reverbs ...), Logic Audio just stop playing.
> You will not ear glitches, because it just stops playing (and warn you to
> reduce your CPU).
> In MediaStream Processing, when I read the spec, it seems to me that, with
> block state, glitches would be less present ?
>

The native processing in the Web Audio API is more resilient to glitches
than the corresponding processing done directly in JavaScript (using either
API).  I would consider that completely stopping playback of the audio
stream is a catastrophic failure. Glitches or complete termination of the
audio presentation are both very undesirable.  I think you misunderstand
how the MediaStream Processing API would handle audio overloads.  If the
JavaScript code in a web worker is simply unable to render at real-time,
then there will be audible glitches - blocking state has nothing to do with
that.  The power or ability of JavaScript code running in a worker to
render complex audio scenes is very much more limited than rendering using
native code.  In other words, the "headroom" of the system to perform well
when adding more and more processing elements is more robust with native
processing.  An analogy can be made with graphics, where we leverage GPUs
to render much more complex graphics scenes at higher animation frame rates
(smoother graphics) than what can be accomplished with software rendering.


>
> In MediaStream Processing API, Audio and Video are treated in the same
> way, which is not the case in Web Audio API which deal only with Audio.
> Could it be possible to have this in the Audio Web API too ?
>

My approach was to create an API which is well-adapted for playing audio,
and can work very well with other APIs dealing with video and graphics.
 This is a similar approach to how Apple designed separate APIs for
CoreAudio, CoreGraphics, CoreVideo, QuickTime, CoreMIDI, etc., all of which
work well together.  I think the large number of media-rich applications on
iOS and Mac OS X have shown this to be a very workable and fruitful
approach.

Cheers,
Chris



>
>
> Cheers,
>
> Samuel
>
>
> Le 31/01/12 23:28, Chris Rogers a écrit :
>
>
>
> On Mon, Jan 30, 2012 at 10:04 AM, Samuel Goldszmidt <
> samuel.goldszmidt@ircam.fr> wrote:
>
>>  Hi all,
>>
>> Here are some comments and questions about web audio working group spec
>> that I would like to share and discuss with you.
>>      I hope not to have made too many misinterpretations of the
>> specifications and, therefore, feel free to correct me where I
>> misunderstood.
>>
>> (This is my first post here. I work at Ircam, which is, in part, a
>> scientific institute where we do research on audio [
>> http://www.ircam.fr/recherche.html?L=1].
>> I'm a multimedia/web engineer, and, for some experimental projects, I use
>> audio tag and HTML5.
>> For research projects and integration purposes , I have to go a step
>> further, and I read with attention both of the two API proposals.)
>>
>> Concerning Web Audio API by Chris Rogers:
>>
>> I see some kind of connections with graphical audio programming tools
>> like PureData or Max/MSP, 'without the interface' (which in my own opinion
>> is great).
>> Have you experienced with these kind of tools ? (These are specially
>> design for real time audio processing).
>>
>
>  Hi Samuel, thanks for having a look at the specification!  I used to
> work at IRCAM, where I designed AudioSculpt, and also worked on SVP, Chant,
> etc.  I'm very familiar with tools like PureData and Max/MSP and even was
> at IRCAM during the same time that Miller Puckette was doing real-time work
> with the ISPW platform and Max.  I've spent most of my career working on
> graph-based audio architecture and DSP.
>
>
>>
>> Concerning the MediaStream Processing API by Robert O'Callahan:
>>
>> First, you talk about *continuous real-time* media. At Ircam, we work on
>> these questions, and, may be the *real time* word, is to restrictive, or
>> may be we don't talk about the same thing. Sometimes, audio
>> processing/treatments can't be done in real time:
>> * some analysis/treatments can be performed faster than real time, for
>> instance, spectral/waveform representations (which are in Use Cases)
>> * in the opposite direction, some treatments can't be done real time, for
>> instance,  you can't make an algorithm which 'mute the sound when the user
>> reaches it middle length', if you don't know the length of the sound
>> because it's played live (do you follow me ?). Sometimes, we need to
>> perform action in 'delayed time'. That's why I don't understand here the
>> importance of the term 'real time'.
>>
>> I agree with the fact that named effect should be at 'level 2'
>> specification. I think that there is no effect ontology that everybody is
>> agree with, so one important thing is to have a 'generic template' for
>> effect/treatment/processing sound. For example, we could have more than
>> just one algorithm to program a reverb and it would be great to be able to
>> have these algorithms as 'AudioNode' javascript availables (We could also
>> have audio engines with different implementations in JavaScript).
>>
>> For spatialization effects, I don't know how the number of output channel
>> could be taken in consideration. Two points I'd like to discuss with you:
>> * the possibility to have, on a device "a more that just stereo
>> restitution" which depends on the hardware connected,
>> * maybe a use case, that, in the manner of MediaQueries, could adapt the
>> audio restitution to the device (how many channels, headphones or speaker
>> ...)
>>
>> "To avoid interruptions due to script execution, script execution can
>> overlap with media stream processing;", is it the fact that we could here
>> deal not only with a sort of asynchronous processing (worker) but have a
>> 'rendering process' that walk through the entire file, and other process
>> that use a 'delayed time' ?
>>
>> (One last question: for mediastream extensions, as for effects that would
>> be in level 2 specification, wouldn't it be better to have an overall
>> createProcessor method for both workers and non-workers processor ?)
>>
>> Finally, correct me if I'm wrong, the main difference I have seen between
>> Web Audio API by Chris Rogers and MediaStream Procession API by Robert
>> O'Callahan is that in the second, all media processing are more linked with
>> DOM objects (media elements in this case) than in the first one (although
>> the graph of the first API seems to me much more easy to understand at
>> first time), which make sense in my point of view.
>>
>
>  The Web Audio API also has a relationship with HTMLMediaElement,
> implemented as MediaElementAudioSourceNode.  You can see an example of its
> use using the createMediaElementSource() method in this section:
>
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#DynamicLifetime-section
>
>  There's also an initial proposal for integration with the WebRTC API:
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/webrtc-integration.html
>
>  Which I presented to the WebRTC working group at the 2011 TPAC meeting.
>  At the meeting we discussed some details, like how this proposal could be
> further refined using MediaStreamTracks.
>
>  Cheers,
> Chris
>
>
>
Received on Wednesday, 1 February 2012 19:07:23 UTC