Newbie questions about web audio working group specs

Hi all,

Here are some comments and questions about web audio working group spec 
that I would like to share and discuss with you.
I hope not to have made too many misinterpretations of the 
specifications and, therefore, feel free to correct me where I 
misunderstood.

(This is my first post here. I work at Ircam, which is, in part, a 
scientific institute where we do research on audio 
[http://www.ircam.fr/recherche.html?L=1].
I'm a multimedia/web engineer, and, for some experimental projects, I 
use audio tag and HTML5.
For research projects and integration purposes , I have to go a step 
further, and I read with attention both of the two API proposals.)

Concerning Web Audio API by Chris Rogers:

I see some kind of connections with graphical audio programming tools 
like PureData or Max/MSP, 'without the interface' (which in my own 
opinion is great).
Have you experienced with these kind of tools ? (These are specially 
design for real time audio processing).

Concerning the MediaStream Processing API by Robert O'Callahan:

First, you talk about *continuous real-time* media. At Ircam, we work on 
these questions, and, may be the *real time* word, is to restrictive, or 
may be we don't talk about the same thing. Sometimes, audio 
processing/treatments can't be done in real time:
* some analysis/treatments can be performed faster than real time, for 
instance, spectral/waveform representations (which are in Use Cases)
* in the opposite direction, some treatments can't be done real time, 
for instance,  you can't make an algorithm which 'mute the sound when 
the user reaches it middle length', if you don't know the length of the 
sound because it's played live (do you follow me ?). Sometimes, we need 
to perform action in 'delayed time'. That's why I don't understand here 
the importance of the term 'real time'.

I agree with the fact that named effect should be at 'level 2' 
specification. I think that there is no effect ontology that everybody 
is agree with, so one important thing is to have a 'generic template' 
for effect/treatment/processing sound. For example, we could have more 
than just one algorithm to program a reverb and it would be great to be 
able to have these algorithms as 'AudioNode' javascript availables (We 
could also have audio engines with different implementations in JavaScript).

For spatialization effects, I don't know how the number of output 
channel could be taken in consideration. Two points I'd like to discuss 
with you:
* the possibility to have, on a device "a more that just stereo 
restitution" which depends on the hardware connected,
* maybe a use case, that, in the manner of MediaQueries, could adapt the 
audio restitution to the device (how many channels, headphones or 
speaker ...)

"To avoid interruptions due to script execution, script execution can 
overlap with media stream processing;", is it the fact that we could 
here deal not only with a sort of asynchronous processing (worker) but 
have a 'rendering process' that walk through the entire file, and other 
process that use a 'delayed time' ?

(One last question: for mediastream extensions, as for effects that 
would be in level 2 specification, wouldn't it be better to have an 
overall createProcessor method for both workers and non-workers processor ?)

Finally, correct me if I'm wrong, the main difference I have seen 
between Web Audio API by Chris Rogers and MediaStream Procession API by 
Robert O'Callahan is that in the second, all media processing are more 
linked with DOM objects (media elements in this case) than in the first 
one (although the graph of the first API seems to me much more easy to 
understand at first time), which make sense in my point of view.

For the moment, I didn't read all the mailing-list, and I'm going to do it.

Regards,

Samuel Goldszmidt

Received on Tuesday, 31 January 2012 12:15:31 UTC