Re: Newbie questions about web audio working group specs from Samuel Goldszmidt on 2012-02-01 (public-audio@w3.org from January to March 2012)

From: Samuel Goldszmidt <samuel.goldszmidt@ircam.fr>
Date: Wed, 01 Feb 2012 16:24:30 +0100
To: public-audio@w3.org
Message-ID: <4F29592E.408@ircam.fr>
Thank you both for your answers.

I read the links you gave me and I'd like to better understand the 
differences between your two APIs.
Don't hesitate to correct me when I'm wrong.

What I understand here, is that Audio Web API constructs a specific 
audio routing graph, with audio node that do processes. It seems that 
MediaStream API don't want to deal with an other graph, based on the 
fact that we have allready DOM real-time media stream graph in HTML 
Stream Spec.
- Is there no 'AudioContext' in the MediaStream API ? (so it's the 
element.src value ?)
- Is there, or where are Javascript Workers in Audio Web API ?
- It seems to me that the Mozilla audio specification here 
https://wiki.mozilla.org/Audio_Data_API, and inside examples, says more 
or less : create a simili Javascript AudioGraph on your own inside a 
worker, and all will be fine, no ?

Chris, in the Audio Web API, you have some kind of predefined effects 
and also a way to define custom processings in Javascript (this could 
also be done at low level with C implementations, and may be a way to 
load this 'C audio plugin' in browser ?). What I understand is that 
these JavaScriptAudioNode could be for custom spatialization tools, 
convolution engine (!), personnal low pass filter, ... but with problems 
due to JavaScript performance 
(http://www.w3.org/TR/webaudio/#JavaScriptPerformance-section).
In Audio Web API, it seems that the developer is responsible for audio 
glitch. Like in Logic Audio for example: if the computer is CPU is too 
high (you use too much convulotion reverbs ...), Logic Audio just stop 
playing. You will not ear glitches, because it just stops playing (and 
warn you to reduce your CPU).
In MediaStream Processing, when I read the spec, it seems to me that, 
with block state, glitches would be less present ?

In MediaStream Processing API, Audio and Video are treated in the same 
way, which is not the case in Web Audio API which deal only with Audio. 
Could it be possible to have this in the Audio Web API too ?

Cheers,

Samuel


Le 31/01/12 23:28, Chris Rogers a écrit :
>
>
> On Mon, Jan 30, 2012 at 10:04 AM, Samuel Goldszmidt 
> <samuel.goldszmidt@ircam.fr <mailto:samuel.goldszmidt@ircam.fr>> wrote:
>
>     Hi all,
>
>     Here are some comments and questions about web audio working group
>     spec that I would like to share and discuss with you.
>     I hope not to have made too many misinterpretations of the
>     specifications and, therefore, feel free to correct me where I
>     misunderstood.
>
>     (This is my first post here. I work at Ircam, which is, in part, a
>     scientific institute where we do research on audio
>     [http://www.ircam.fr/recherche.html?L=1].
>     I'm a multimedia/web engineer, and, for some experimental
>     projects, I use audio tag and HTML5.
>     For research projects and integration purposes , I have to go a
>     step further, and I read with attention both of the two API
>     proposals.)
>
>     Concerning Web Audio API by Chris Rogers:
>
>     I see some kind of connections with graphical audio programming
>     tools like PureData or Max/MSP, 'without the interface' (which in
>     my own opinion is great).
>     Have you experienced with these kind of tools ? (These are
>     specially design for real time audio processing).
>
>
> Hi Samuel, thanks for having a look at the specification!  I used to 
> work at IRCAM, where I designed AudioSculpt, and also worked on SVP, 
> Chant, etc.  I'm very familiar with tools like PureData and Max/MSP 
> and even was at IRCAM during the same time that Miller Puckette was 
> doing real-time work with the ISPW platform and Max.  I've spent most 
> of my career working on graph-based audio architecture and DSP.
>
>
>     Concerning the MediaStream Processing API by Robert O'Callahan:
>
>     First, you talk about *continuous real-time* media. At Ircam, we
>     work on these questions, and, may be the *real time* word, is to
>     restrictive, or may be we don't talk about the same thing.
>     Sometimes, audio processing/treatments can't be done in real time:
>     * some analysis/treatments can be performed faster than real time,
>     for instance, spectral/waveform representations (which are in Use
>     Cases)
>     * in the opposite direction, some treatments can't be done real
>     time, for instance,  you can't make an algorithm which 'mute the
>     sound when the user reaches it middle length', if you don't know
>     the length of the sound because it's played live (do you follow me
>     ?). Sometimes, we need to perform action in 'delayed time'. That's
>     why I don't understand here the importance of the term 'real time'.
>
>     I agree with the fact that named effect should be at 'level 2'
>     specification. I think that there is no effect ontology that
>     everybody is agree with, so one important thing is to have a
>     'generic template' for effect/treatment/processing sound. For
>     example, we could have more than just one algorithm to program a
>     reverb and it would be great to be able to have these algorithms
>     as 'AudioNode' javascript availables (We could also have audio
>     engines with different implementations in JavaScript).
>
>     For spatialization effects, I don't know how the number of output
>     channel could be taken in consideration. Two points I'd like to
>     discuss with you:
>     * the possibility to have, on a device "a more that just stereo
>     restitution" which depends on the hardware connected,
>     * maybe a use case, that, in the manner of MediaQueries, could
>     adapt the audio restitution to the device (how many channels,
>     headphones or speaker ...)
>
>     "To avoid interruptions due to script execution, script execution
>     can overlap with media stream processing;", is it the fact that we
>     could here deal not only with a sort of asynchronous processing
>     (worker) but have a 'rendering process' that walk through the
>     entire file, and other process that use a 'delayed time' ?
>
>     (One last question: for mediastream extensions, as for effects
>     that would be in level 2 specification, wouldn't it be better to
>     have an overall createProcessor method for both workers and
>     non-workers processor ?)
>
>     Finally, correct me if I'm wrong, the main difference I have seen
>     between Web Audio API by Chris Rogers and MediaStream Procession
>     API by Robert O'Callahan is that in the second, all media
>     processing are more linked with DOM objects (media elements in
>     this case) than in the first one (although the graph of the first
>     API seems to me much more easy to understand at first time), which
>     make sense in my point of view.
>
>
> The Web Audio API also has a relationship with HTMLMediaElement, 
> implemented as MediaElementAudioSourceNode.  You can see an example of 
> its use using the createMediaElementSource() method in this section:
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#DynamicLifetime-section
>
> There's also an initial proposal for integration with the WebRTC API:
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/webrtc-integration.html
>
> Which I presented to the WebRTC working group at the 2011 TPAC 
> meeting.  At the meeting we discussed some details, like how this 
> proposal could be further refined using MediaStreamTracks.
>
> Cheers,
> Chris
>
Received on Wednesday, 1 February 2012 15:25:07 UTC