- From: Alistair MacDonald <al@signedon.com>
- Date: Mon, 26 Mar 2012 18:10:24 -0400
- To: public-audio@w3.org
- Message-ID: <CAJX8r2mnibaWSVbc_9xd5NBzegbWRLsPzu1fX=m-hgME1kugZQ@mail.gmail.com>
Hi Group, Here are the minutes from todays teleconference. RRSAgent sends his regrets. [ATTENDEES] Alistair, gcardoso, jussi, Doug_Schepers, joe, CRogers [SCRIBE] Joe Berkovitz [AGENDUM 1] Zakim: agendum 1. "ISSUE-4: Setting sample rates for individual JavaScriptProcessingNodes" taken up [from Alistair] al: I think this was requested by Jussi al: http://www.w3.org/2011/audio/track/issues/4 al: Unfortunately ROC is not joining today ...there's been quite a bit of talk about this, wondering where we are at ...and Chris, what do you think from an API POV crogers: an easy way is to allow the sample rate to be set when an audio ctx is created al: don't need to use sample rates w/in same graph crogers: allowing multiple rates would create a lot of complexity ....and so I avoided this up to this point because it would get quite complicated ...we could throw exceptions, etc. but would become kind of a rats' nest. The reason diff. rates were interesting ...was so Jussi wouldn't have to do his own rate conversion code by hand jussi: we can change playback nodes' sample rates, so think it won't add much more complication then there already is crogers: it's not changing the sample rate coming out of the audio src node. let's say the ctx is @ 44.1k and you're playing around ... with the sample rate of a node. at the end of the day, node is still outputting data @ 44.1k jussi: for our purposes that's all we need. we don't care what the output rate is crogers: it was my understanding that you wanted to have a node that would operate at its own completely different sample rate jussi: yes exactly crogers: the easy way to solve this problem is to allow an audio ctx to be created at a specific sample rate ... we can later on decide to have an explicit rate converter nodes. I'm reluctant to have them in there now due to the complication. ... would you be satisfied if we stayed with the audio ctx having an optional sample rate in the constructor? jussi: it would be much more useful if specified for a specific JS node so you could have multiple streams ... the JS node should convert the incoming signal to the requested rate and then back to the ctx rate al: it sounds like this might be something that's easier to put into the 2nd version of the spec ... or do you think Jussi that not being able to pull these different streams together is a showstopper jussi: not a showstopper but a severe limitation ... i'm not convinced that we should do this without having proper use cases al: I think that's well said, we need to look at the UCs for this feature and understand it better. al: we'll discuss UCs on the mailing list [AGENDUM 2] Zakim: agendum 2. "ISSUE-5: Pausing a sub-graph" taken up [from Alistair] al: this has come up a lot. jussi, you've talked about this and so has ROC. al: http://www.w3.org/2011/audio/track/issues/5 al: trying to figure out the exact use cases. can Jussi or Chris outline the point of this feature jussi: pausing the subgraph is not essential, one can work around. it's more like syntactic sugar. ... it might make a lot of things easier but if overly complex from spec POV I'm not going to pursue further crogers: Think that what ROC is trying to do is a bit unusual and tried to explain the different cases in recent email ... about what happens when pausing, continuing, etc. In a regular recording studio when some tracks are going to FX and ... you pause the track, you'll continue to hear the echo. Traditionally pausing does not affect downstream effects. ... but in ROC's view old states like echo from paused tracks would resume, which is not the way that traditional analog racks, etc. would work al: I spent a bunch of time going over stuff in the emails and specs to try to understand this and emailed ROC ... I couldn't figure out the use case for this. It seems like the audio output that you would get is something that would stutter and stop a lot ... the experience seems not ideal. ROC's response was to suggest a use case al: http://lists.w3.org/Archives/Public/public-audio/2012JanMar/0495.html al: if you have a set of programs in the browser, you might want the audio to pause at an exact point ... but as far as how pausing relates to streaming [?] crogers: I see you'd want to pause some parts but as far as echo state is stored/cleared, that's the subtlety of it ... I don't understand why the way you can pause with the Web Audio API is insufficient jussi: If you wanted to make a plugin that controls all the ctxs running in your browser ... for external control a feature like this would be useful but it's hard to implement if you don't know what the program is exactly doing. ... for example you might want to make an app that controls all the sounds going on in your system. you might want to pause something that is annoying ... so you might have an external sound controller app for that. It would be hard to tell what the audio ctx actually contains crogers: but you're saying it's hard to know what's in the graph but would you have access to the audio ctx itself? if so, why wouldn't you have access to all the nodes ... I would have to see the exact use case ... seems like you can just enumerate the nodes in the ctx and do it that way al: seems like the UC is on the outside of the main focus of what we're working on now ... it's still a good idea to discuss further on the mailing list ... and think about how this fits into the long term plan [AGENDUM 3] Zakim: agendum 3. "ISSUE-7: Power of Two FFTs for RealtimeAnalyserNode" taken up [from Alistair] al: two things here. One is documentation. Seems like there's a range for size of FFT and this is not in the docs. al: https://www.w3.org/2011/audio/track/issues/7 al: it would be advantageous if size were not limited to power of 2 jussi: as I said on thread, usually when not running in real time one usually wants to run on arbitrary time windows ... for these kinds of processes arbitrary FFT sizes would be good al: are you also saying drop the words "real time" jussi: if it doesn't affect the behvaior crogers: one technique is to use a window; even if your FFT is a power of 2, you use a window on a smaller number of samples ... there is an implementation cost for arbitrary sizes, it gets complex. in the analysis work I've done, it's always been sufficient to use a 2^N FFT with a smaller sample window al: question for both of you: does this alter the performance or accuracy of results? crogers: performance is certainly different with an arbitrary size, it's not an FFT any more it's a DFT. ... weird size xforms require math that is a lot slower jussi: you can usually get away with any non-crazy size crogers: I haven't seen people using these strange size transforms. Instead people use Kaiser windows, and so on. jussi: most FFT libraries don't have anything other than 2^N sizes. crogers: this node was designed more for real time analysis e.g. visualizers than for audio processing al: can we use this for faster than realtime output? crogers: not right now, really. how would we do faster-than-realtime output for frequency analysis? al: I could throw a UC out there. If you were to do some sort of xform to drive how audio was set up in the future based on how it is now crogers: a developer once wanted analysis frame faster than RT and store the results in a range of analysis frames to display as a spectrogram ... you can't really do this faster than RT right now ... the graph in the web audio API is always dealing with what's happening right now in the time domain. you can't pass freq domain data around the graph ... that gets really complicated really fast ... the current node is designed primarily for visualizers ... for true spectral-domain processing the current node isn't usable right now al: I've been talking with an effects processing company and they're interested in the W3C audio work. ... they are visualizing different frequencies ahead of time and adaptively adjusting the audio in response ... would this play into what we're discussing? shepazu: the current node is for one particular use. we know that there will be a pile of things that this node and API won't do ... I'm already hearing people say, "do this and this and this" that go beyond. But implementors are saying, "give us something we can build straightforwardly" ... I think we should at this point put a pin in this particular point and make it clear that the case we're optimizing for ... is the case that crogers already spoke to, of visualizers etc. does this make sense? crogers: in answer to al, in these types of apps, I've worked on this kind of thing before at IRCAM. it would analyze a sound file ... and draw a spectrograph that you could draw on to create time-varying filters, etc. ... I am interested in those kinds of apps but going back to what Doug is saying, [the Web Audio API] graph is optimized for time-varying signals. ... it gets really complicated when you're tossing in frequency-domain data as well ... you can certainly do all these things in JS though shepazu: another part of my point is we dont have to solve everything in v1. We'll find out what we need as people experiment with ... what we put out. There will be more specs to come. jussi: I suggest that we change the name of the issue. For the given UCs the 2^N restriction makes sense. Suggest we propose a new node ... that simply does an FFT and converts from time to freq domain and back ... you could put an FFT node, then a delay, then a reverse FFT crogers: that would be like a phase vocoder engine. if you are doing freq domain processing then you have to work with overlapping portions of incoming audio ... you have to move a sliding window ... this is all very cool but it's more complex than just adding some new node types jussi: I am actually talking about non-realtime processing crogers: I've actually seen people writing these time stretching algorithms in JS ... you can do this offline joe: we should make sure these new suggestions are linked to use cases, to avoid going off track jussi: if we add an fft node [loud tone intervenes] ... I was going to say if we add an FFT node it's best if it's in the 2nd version of the spec shepazu: to be concrete about it: we should have a UC and requirement on this to take it further ... it sounds like you have a specific suggestion, and you can also put in your suggested solution to the requirement e.g. an FFT node [ACTION] shepazu: action: jussi to write up scenario, requirements, and proposal for FFT node case * trackbot noticed an ACTION. Trying to create it. trackbot: Created ACTION-42 - Write up scenario, requirements, and proposal for FFT node case [on Jussi Kalliokoski - due 2012-04-02]. al: are we in general agreement that we don't need to make the FFT size arbitrary jussi: we don't need to. it doesn't help any of the current UCS [RESOLUTION] joe: RESOLUTION: an arbitrary size FFT is not needed for version 1 shepazu: Resolution: an arbitrary-size FFT is not needed for version 1 (per Issue-7) [TELECON ENDS] -- Alistair MacDonald SignedOn, Inc - W3C Audio WG Boston, MA, (707) 701-3730 al@signedon.com - http://signedon.com
Received on Monday, 26 March 2012 22:10:53 UTC