Security of cross-origin audio from Martin Thomson on 2013-05-30 (public-webrtc@w3.org from May 2013)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Thu, 30 May 2013 08:06:50 -0700
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <CABkgnnXjMHj0ueNHPRSM8M0ZG6Zkx6hkt6aD7FSzrpvpJAhkYQ@mail.gmail.com>

When video tracks are combined, the output is still confined to a
certain set of pixels.  We have a good set of rules with respect to
cross origin sampling of video and images (any affected pixels can't
be accessed), but cross origin audio seems to need something broader.

Audio has the wonderful distinction of being very hard to place.
Spatial effects aside, all audio, regardless of origin, goes to the
same output.

Maybe a site cannot access the microphone if the speakers are playing
audio from another origin.  I've noticed that some WebRTC
implementations have echo cancellation that would not be effective at
removing audio from sampled output (and this was with headphones).
I'm certain that this would not be sufficient for any security
guarantees.

It would be a failure on our part if a site that has access to your
microphone, but not a remote stream, could recover that remote stream.
 As it stands, I believe this to be possible.  [1]

Going further, it's going to be difficult for a user to distinguish
between stuff Joe said and the crap that the site is pumping out the
speakers.  In WebRTC cases where the browser is required to make
assertions about the origin of audio and video (not data!), it would
be bad if the veracity of these assertions can be compromised simply
by playing other noises and tampering with output levels.  After all,
if the site knows who you are talking to, generating random expletives
might be easy.

At a minimum, this needs some security considerations.  Though I would
be disappointed if that is all that was done.  I would prefer to see
some mechanisms proposed to address these issues.

--Martin

[1] Bugs make this a lot worse.  I have two sets of headphones
attached to different audio devices on my computer.  In one example, I
was using another application to play audio to a completely one set of
headphones while testing a WebRTC call.  I was able to detect
phase-shifted audio on the headphones that were being used for the
WebRTC call.

Received on Thursday, 30 May 2013 15:07:21 UTC