Re: Summary of e2e encryption discussions

Before I delve into the use cases, some contemplations on trust:

- A critical part of trust is "with what".

The premise in these scenarios is that we trust all parties with the
knowledge that communication is going on. We also trust the intermediate
nodes (the least trusted parties in all these scenarios) with some
information on who the communication is with (at least their point of
attachment to the network, if not their verifiable identities).

So what we are protecting is the content of the communication - the
video, audio and data (text or otherwise) that the parties are exchanging.

In all scenarios, we assume that the browser is trusted with access to
all media and data, and obeys the rules for policing applications'
access to these.

We can assume that the browser knows about video and audio as concepts,
and can take steps to handle those specially. For data, we have the
origin model, where a browsing context has limited ability to access
contexts with other origins; it is probably wise to assume that there
will be no new mechanisms for data protection invented for our use cases.

Den 22. juni 2018 04:55, skrev youenn fablet:
> Hi all,
> 
> Following on last F2F meeting, please find below some notes on WebRTC end-to-end encryption.
> This should complete my related action item.
> 
> During the discussions, three security models were identified:
> 1. Trusted web application: a trusted website uses intermediate nodes from a potentially untrusted WebRTC provider
> 2. Partially trusted web application: a trusted web site uses a SDK from a potentially untrusted WebRTC provider
> 3. Untrusted web application

I like this classification.
For argument's sake, let's say that in case 2, the trusted web
application and the untrusted components are running in different
origins; I don't think we can make any protection work for pages from
the same origin. (Seeking contradiction here!)

> 
> Each use case is described in more details below and some personal thoughts are at the end of the message.
> Alex, Sergio, let me know if you have ideas on how to best format that information and/or how to move forward the topic.
> I would tend to focus on use case 1, and optionally investigate how useful and deployable opaque/isolated streams are.
> 
> Thanks,
>  Y
> 
> In the context of a WebRTC exchange, two types of nodes can be identified:
> Final nodes are producing and consuming content. These are typically web applications.
> Intermediate nodes allow routing content to the final nodes. These are for instance SFUs.
> 
> 1. Trusted web application
> Web applications are trusted and intermediate nodes are not trusted.
> This is a scenario that happens in the banking industry: banks control the applications but delegate the network to a potentially untrusted WebRTC provider.
> 
> Requirements:
> - The content needs to be protected from being accessed by intermediate nodes.
> - The content does NOT need to be protected from being accessed by web applications.
> 
> Potential solution:
> - Apply encryption on the content that intermediate nodes cannot decrypt. This encryption would be done in addition and before the DTLS encryption.
> - The webrtc browser layer could implement this additional encryption with encryption keys provided by the web application through a specific web API.
> - The web application could implement this additional encryption itself if enough media pipeline low level JS APIs were provided.

This is, I think, the scenario PERC was designed for on the network layer.
A critical part of this is also that the Web application has to attach
enough information to the encrypted packets that the SFU can do its job
of choosing which packets / frames to forward; in SVC or simulcast, this
includes labelling packets containing video with enough information to
reconstruct the dependency graph, for instance; if "loudest audio"
switching is in effect, the outside of the packet has to contain audio
level information.
Mixing at the SFU is impossible by design.

> 2. Partially trusted web application
> Web applications run some code that is trusted and run some code that is untrusted.
> This can for instance be the case when the top level page « mytrustedwebsite.com » embeds an iframe from « mywebrtcprovider.com ».
> The iframe implements all the WebRTC black magic, including potentially media content rendering.
> Intermedia nodes are not trusted.
> 
> Requirements:
> - The content needs to be protected from being accessed by intermediate nodes.
> - The content does NOT need to be protected from being accessed by the top level page.
> - The content needs to be protected from being accessed by iframe.
> 
> Potential solution:
> - Apply encryption on the content that intermediate nodes cannot decrypt. Compared to the trusted web application case, the iframe cannot have direct access to the encryption keys. The top level page might have access and register them to the browser. Key identifiers might be shared with the iframe.
> - Make the outgoing content be opaque in the untrusted iframe. One possibility is for the iframe to call directly getUserMedia and to receive an opaque stream. This could be achieved with some mark-up like allow=‘opaque-camera’ on the iframe. A second possibility is for the top-level page to call getUserMedia, receive a non-isolated stream and transfer the stream to the untrusted iframe, the stream becoming opaque at this time.

Relationship between this scenario and "isolated stream" is a bit complex.

The current "isolated stream" design has isolation be automatically
applied for cross-origin tracks:
https://w3c.github.io/webrtc-pc/#isolated-media-streams; if giving
permission for transmission to a specific origin, the identity must be
part of the getUserMedia() call (which also isolates it from the current
origin).

Taking advantage of isolation (either automatic or with "peerIdentity")
requires that MediaStreams be transferable by postMessage(), since this
is how we move stuff between origins.

On the receiver side, negotiating a PeerConnection with isolated streams
will produce only isolated streams.
(https://w3c.github.io/webrtc-pc/#isolated-pc mixes the terms "stream"
and "track" a bit. Some cleanup may be needed.)

A result of the current design would be that the trusted part of the
application would be unable to access the media.

My take: For this use case, isolation needs some thought, and possible
redesign. And isolation that exposes information at intermediate nodes
doesn't satisfy the initial design criteria.

> - Make the incoming streams protected by double encryption be opaque by default or depending on how the encryption keys were retrieved. The top level page could also be able to remove this opaque protection when needed.
> 
> Potential issues:
> - Text chat is not straightforward to implement: the top-level page would need to handle the input/output of text so that the iframe has access to encrypted text data only, or some new HTML construct would be needed.

A text widget with encryption + postMessage seems like a smaller thing
to integrate in the top level than the entire WebRTC handling, though.

> 
> 3. Fully untrusted web application
> Neither web applications nor intermediate nodes are trusted.
> The content needs to be protected from being accessed by the web application.
> The content needs to be protected from being accessed by intermediate nodes.
> 
> Potential solution:
> - Apply encryption on the content that intermediate nodes cannot decrypt. Web application cannot get any access to the keys. A configuration step is required by the webrtc layer and could rely on an IdP infrastructure.
> - Let the web application call getUserMedia to produce an outgoing stream that needs to be opaque. This could rely on an IdP infrastructure and would require a dedicated UI at getUserMedia prompt time. This could also be done through some configuration settings.
> - Make any incoming stream protected by double encryption be opaque.
> - Provide access to the encryption keys outside of the web application, for instance using an IdP infrastructure.
> 
> Potential issues:
> - Text chat does not seem to be possible without some new HTML construct.
> - This relies on IdP getting some adoption.

This is also the scenario that isolation was designed for, but isolation
doesn't address the intermediate node problem.
> 
> Some personal thoughts:
> Focusing on use case 1 (fully trusted web application) might allow making progress in this area. It has a well defined scope and would be a base brick for use cases 2 and 3.
> Use case 2 has a somewhat wider scope and a limited complexity. It should first be proved that opaque streams would be actually deployed as it can cause potential user experience issues. For instance, in multi-party video conference scenarios, it is desirable to update the UI based on who is speaking, silence detection might help improve audio quality, a microphone level meter is often available…
> Use case 3 has a similar issue with regards to opaque streams. It would also rely heavily on IdP or a mechanism similar to IdP. It is unclear whether there is sufficient interest in that area and how much a good getUserMedia prompt UI could be designed. 
> 
> It might also be beneficial to study WebRTC broadcasting and EME-like scenarios as the concept of opaque media content might prove to be useful in that context.

It seems to me that we may have useful tools in the isolation / identity
work to think about these matters, but solving the intermediate node
problem requires us to incorporate some form of double encryption (and
the corresponding envelope marking) into the isolation concept.

Received on Friday, 22 June 2018 09:19:37 UTC