Re: getUserMedia use cases from Chris Wilson on 2012-01-31 (public-audio@w3.org from January to March 2012)

From: Chris Wilson <cwilso@google.com>
Date: Tue, 31 Jan 2012 10:38:12 -0800
To: Robin Berjon <robin@berjon.com>
Cc: Chris Rogers <crogers@google.com>, Doug Schepers <schepers@w3.org>, public-audio@w3.org, Dom Hazael-Massieux <dom@w3.org>, Henrik Andreasson <henrika@google.com>
Message-ID: <CAJK2wqW-Bpn9tem2FdT5YgsyVA-Tca64mdQx4=tsHvu+_Z6JWQ@mail.gmail.com>

See below:

On Tue, Jan 31, 2012 at 6:12 AM, Robin Berjon <robin@berjon.com> wrote:

> On Jan 30, 2012, at 21:24 , Chris Rogers wrote:
> > Yes, it would be good to have an introspection API to enumerate the
> available audio devices for both audio input and output.  A built-in
> microphone would be one such device.  Also commonly available is the
> line-in analog audio input on a laptop or desktop computer.  And, of
> course, any externally connected multi-channel USB or Firewire audio
> devices.  Some of these can present eight (or more) simultaneous audio
> input and output channels simultaneously.
>
> I see where you're coming from, but enumeration is a potential
> fingerprinting problem. If any arbitrary page can start listing all the
> audio inputs and outputs that you have, especially if it includes their
> capabilities (as I guess that would be useful for your needs) then it
> starts becoming very easy to fingerprint a given user.
>
> The current model that gUM uses is that 1) the JS requests access to a
> media stream (with options), 2) the user is presented with a choice of
> sources from which to select, 3) the selected source is returned as a
> stream. So essentially the enumeration happens in the user <-> UA space,
> and isn't accessible to script. If that's sufficient for your needs, then
> we're probably gold, if not then we need to figure out something that works
> for you. The Gamepad API has a different model to prevent enumeration:
> gamepad information becomes available whenever there's activity on the
> gamepad. But I suspect that won't translate well to audio. Either way, we
> can find solutions, we just need to have a better picture of your needs and
> balance those against the need for privacy.
>
> I also suspect that the current model can only return a single source at a
> time and that that is likely to be an issue for audio (it might be okay for
> the user to have to work with a dialog to select sources, but it's probably
> not okay for the user to have to do that eight times in a row to select
> eight sources...). I think that the API could be modified to accept a
> "multiple" flag (just like file inputs) and correspondingly return an array
> of streams.
>

True.  Even MORE true is that when I have carefully set up a multi-track
recording session in my DAW, I don't want to have to go through and
re-enumerate the I/O channels.  Note that the same kind of pattern will be
somewhat true for MIDI - there are potentially many I/Os, and the user
would be frustrated if they had to set them up by hand every session.  Even
as a "hobby" home studio nearly ten years ago, I had a 16-audio-track and
16-MIDI-ins/16-MIDI-outs (x16 channels each) setup.  And my audio
recordings didn't typically use all 16 channels, but did regularly use a
dozen or so simultaneously.  That "fingerprint" is going to be important -
even if you treat it as an opaque "suggested multitrack identifier" (i.e.,
the API is called with "multiple" turned on, and returns a unique opaque ID
that the system can use later (in a different session) to try to recreate
the same setup.)

With MIDI, of course, you don't have quite the same fingerprinting
opportunity by using capabilities of the channel, but you could send sysex
to see what responds and fingerprint that way.  Is this worse than cookies,
or other potentially identifying features?  (Not saying that's to be
ignored, just trying to understand what category it is in.)

> > It's important to not consider audio input in isolation, but also audio
> output capabilities when enumerating the devices.
>
>
> I'm not sure that I really understand this comment. Right now, gUM is
> pretty much input-only. I certainly see how selecting outputs can be
> useful, but I'm not sure that it fits very well into this model (but I
> could very well be missing something).
>
> Please note that in pointing out issues above I'm not in any way rejecting
> your use cases. I'm merely pointing out that there are problems to be
> overcome, and that understanding what you need exactly would be very
> helpful in doing so.
>
> --
> Robin Berjon - http://berjon.com/ - @robinberjon
>
>
>

Received on Tuesday, 31 January 2012 18:38:48 UTC