Telecon Minutes from Today from Alistair MacDonald on 2012-03-26 (public-audio@w3.org from January to March 2012)

From: Alistair MacDonald <al@signedon.com>
Date: Mon, 26 Mar 2012 18:10:24 -0400
To: public-audio@w3.org
Message-ID: <CAJX8r2mnibaWSVbc_9xd5NBzegbWRLsPzu1fX=m-hgME1kugZQ@mail.gmail.com>
Hi Group,

Here are the minutes from todays teleconference. RRSAgent sends his regrets.



[ATTENDEES]
  Alistair, gcardoso, jussi, Doug_Schepers, joe, CRogers


[SCRIBE]
  Joe Berkovitz


[AGENDUM 1]

  Zakim: agendum 1. "ISSUE-4: Setting sample rates for individual
JavaScriptProcessingNodes" taken up [from Alistair]

  al: I think this was requested by Jussi
  al: http://www.w3.org/2011/audio/track/issues/4
  al: Unfortunately ROC is not joining today
  ...there's been quite a bit of talk about this, wondering where we are at
  ...and Chris, what do you think from an API POV

  crogers: an easy way is to allow the sample rate to be set when an audio
ctx is created

  al: don't need to use sample rates w/in same graph

  crogers: allowing multiple rates would create a lot of complexity
  ....and so I avoided this up to this point because it would get quite
complicated
  ...we could throw exceptions, etc. but would become kind of a rats' nest.
The reason diff. rates were interesting
  ...was so Jussi wouldn't have to do his own rate conversion code by hand

  jussi: we can change playback nodes' sample rates, so think it won't add
much more complication then there already is

  crogers: it's not changing the sample rate coming out of the audio src
node.  let's say the ctx is @ 44.1k and you're playing around
  ... with the sample rate of a node. at the end of the day, node is still
outputting data @ 44.1k

  jussi: for our purposes that's all we need. we don't care what the output
rate is

  crogers: it was my understanding that you wanted to have a node that
would operate at its own completely different sample rate

  jussi: yes exactly

  crogers: the easy way to solve this problem is to allow an audio ctx to
be created at a specific sample rate
  ... we can later on decide to have an explicit rate converter nodes. I'm
reluctant to have them in there now due to the complication.
  ... would you be satisfied if we stayed with the audio ctx having an
optional sample rate in the constructor?

  jussi: it would be much more useful if specified for a specific JS node
so you could have multiple streams
  ... the JS node should convert the incoming signal to the requested rate
and then back to the ctx rate

  al: it sounds like this might be something that's easier to put into the
2nd version of the spec
  ... or do you think Jussi that not being able to pull these different
streams together is a showstopper

  jussi: not a showstopper but a severe limitation
  ... i'm not convinced that we should do this without having proper use
cases

  al: I think that's well said, we need to look at the UCs for this feature
and understand it better.
  al: we'll discuss UCs on the mailing list

[AGENDUM 2]

  Zakim: agendum 2. "ISSUE-5: Pausing a sub-graph" taken up [from Alistair]

  al: this has come up a lot. jussi, you've talked about this and so has
ROC.
  al: http://www.w3.org/2011/audio/track/issues/5
  al: trying to figure out the exact use cases. can Jussi or Chris outline
the point of this feature

  jussi: pausing the subgraph is not essential, one can work around. it's
more like syntactic sugar.
  ... it might make a lot of things easier but if overly complex from spec
POV I'm not going to pursue further

  crogers: Think that what ROC is trying to do is a bit unusual and tried
to explain the different cases in recent email
  ... about what happens when pausing, continuing, etc.  In a regular
recording studio when some tracks are going to FX and
  ... you pause the track, you'll continue to hear the echo. Traditionally
pausing does not affect downstream effects.
  ... but in ROC's view old states like echo from paused tracks would
resume, which is not the way that traditional analog racks, etc. would work

  al: I spent a bunch of time going over stuff in the emails and specs to
try to understand this and emailed ROC
  ... I couldn't figure out the use case for this. It seems like the audio
output that you would get is something that would stutter and stop a lot
  ... the experience seems not ideal. ROC's response was to suggest a use
case

  al: http://lists.w3.org/Archives/Public/public-audio/2012JanMar/0495.html

  al: if you have a set of programs in the browser, you might want the
audio to pause at an exact point
  ... but as far as how pausing relates to streaming [?]

  crogers: I see you'd want to pause some parts but as far as echo state is
stored/cleared, that's the subtlety of it
  ... I don't understand why the way you can pause with the Web Audio API
is insufficient

  jussi: If you wanted to make a plugin that controls all the ctxs running
in your browser
  ... for external control a feature like this would be useful but it's
hard to implement if you don't know what the program is exactly doing.
  ... for example you might want to make an app that controls all the
sounds going on in your system. you might want to pause something that is
annoying
  ... so you might have an external sound controller app for that. It would
be hard to tell what the audio ctx actually contains

  crogers: but you're saying it's hard to know what's in the graph but
would you have access to the audio ctx itself? if so, why wouldn't you have
access to all the nodes
  ... I would have to see the exact use case
  ... seems like you can just enumerate the nodes in the ctx and do it that
way

  al: seems like the UC is on the outside of the main focus of what we're
working on now
  ... it's still a good idea to discuss further on the mailing list
  ... and think about how this fits into the long term plan


[AGENDUM 3]

  Zakim: agendum 3. "ISSUE-7: Power of Two FFTs for RealtimeAnalyserNode"
taken up [from Alistair]

  al: two things here. One is documentation. Seems like there's a range for
size of FFT and this is not in the docs.
  al: https://www.w3.org/2011/audio/track/issues/7
  al: it would be advantageous if size were not limited to power of 2

  jussi: as I said on thread, usually when not running in real time one
usually wants to run on arbitrary time windows
  ... for these kinds of processes arbitrary FFT sizes would be good

  al: are you also saying drop the words "real time"

  jussi: if it doesn't affect the behvaior

  crogers: one technique is to use a window; even if your FFT is a power of
2, you use a window on a smaller number of samples
  ... there is an implementation cost for arbitrary sizes, it gets complex.
in the analysis work I've done, it's always been sufficient to use a 2^N
FFT with a smaller sample window

  al: question for both of you: does this alter the performance or accuracy
of results?

  crogers: performance is certainly different with an arbitrary size, it's
not an FFT any more it's a DFT.
  ... weird size xforms require math that is a lot slower

  jussi: you can usually get away with any non-crazy size

  crogers: I haven't seen people using these strange size transforms.
Instead people use Kaiser windows, and so on.

  jussi: most FFT libraries don't have anything other than 2^N sizes.

  crogers: this node was designed more for real time analysis e.g.
visualizers than for audio processing

  al: can we use this for faster than realtime output?

  crogers: not right now, really. how would we do faster-than-realtime
output for frequency analysis?

  al: I could throw a UC out there. If you were to do some sort of xform to
drive how audio was set up in the future based on how it is now

  crogers: a developer once wanted analysis frame faster than RT and store
the results in a range of analysis frames to display as a spectrogram
  ... you can't really do this faster than RT right now
  ... the graph in the web audio API is always dealing with what's
happening right now in the time domain. you can't pass freq domain data
around the graph
  ... that gets really complicated really fast
  ... the current node is designed primarily for visualizers
  ... for true spectral-domain processing the current node isn't usable
right now

  al: I've been talking with an effects processing company and they're
interested in the W3C audio work.
  ... they are visualizing different frequencies ahead of time and
adaptively adjusting the audio in response
  ... would this play into what we're discussing?

  shepazu: the current node is for one particular use. we know that there
will be a pile of things that this node and API won't do
  ... I'm already hearing people say, "do this and this and this" that go
beyond. But implementors are saying, "give us something we can build
straightforwardly"
  ... I think we should at this point put a pin in this particular point
and make it clear that the case we're optimizing for
  ... is the case that crogers already spoke to, of visualizers etc. does
this make sense?

  crogers: in answer to al, in these types of apps, I've worked on this
kind of thing before at IRCAM. it would analyze a sound file
  ... and draw a spectrograph that you could draw on to create time-varying
filters, etc.
  ... I am interested in those kinds of apps but going back to what Doug is
saying, [the Web Audio API] graph is optimized for time-varying signals.
  ... it gets really complicated when you're tossing in frequency-domain
data as well
  ... you can certainly do all these things in JS though

  shepazu: another part of my point is we dont have to solve everything in
v1. We'll find out what we need as people experiment with
  ... what we put out. There will be more specs to come.

  jussi: I suggest that we change the name of the issue. For the given UCs
the 2^N restriction makes sense.  Suggest we propose a new node
  ... that simply does an FFT and converts from time to freq domain and back
  ... you could put an FFT node, then a delay, then a reverse FFT

  crogers: that would be like a phase vocoder engine. if you are doing freq
domain processing then you have to work with overlapping portions of
incoming audio
  ... you have to move a sliding window
  ... this is all very cool but it's more complex than just adding some new
node types

  jussi: I am actually talking about non-realtime processing


  crogers: I've actually seen people writing these time stretching
algorithms in JS
  ... you can do this offline

  joe: we should make sure these new suggestions are linked to use cases,
to avoid going off track

  jussi: if we add an fft  node [loud tone intervenes]
  ... I was going to say if we add an FFT node it's best if it's in the 2nd
version of the spec

  shepazu: to be concrete about it: we should have a UC and requirement on
this to take it further
  ... it sounds like you have a specific suggestion, and you can also put
in your suggested solution to the requirement e.g. an FFT node


[ACTION]

  shepazu: action: jussi to write up scenario, requirements, and proposal
for FFT node case
  * trackbot noticed an ACTION. Trying to create it.
  trackbot: Created ACTION-42 - Write up scenario, requirements, and
proposal for FFT node case [on Jussi Kalliokoski - due 2012-04-02].

  al: are we in general agreement that we don't need to make the FFT size
arbitrary

  jussi: we don't need to. it doesn't help any of the current UCS


[RESOLUTION]

  joe: RESOLUTION: an arbitrary size FFT is not needed for version 1
  shepazu: Resolution: an arbitrary-size FFT is not needed for version 1
(per Issue-7)


[TELECON ENDS]


-- 
Alistair MacDonald
SignedOn, Inc - W3C Audio WG
Boston, MA, (707) 701-3730
al@signedon.com - http://signedon.com
Received on Monday, 26 March 2012 22:10:53 UTC