Re: [EME] Netflix’s secure release is unreliable without tamper-proof secure persistent storage and/or delayed shutdown from Mark Watson on 2015-06-12 (public-html-media@w3.org from June 2015)

From: Mark Watson <watsonm@netflix.com>
Date: Thu, 11 Jun 2015 18:08:21 -0700
To: David Dorwin <ddorwin@google.com>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <CAEnTvdB=dFSO5mqCLcWySHTmO0bSSj+9co2Bzj12mZu=z3XWQg@mail.gmail.com>
Hi David,

Thanks for this detailed response. There is one significant mis-conception
below, which I will address in a moment, but I believe the heart of the
issue here is what constitutes "delayed shutdown".

I had previously asked that you provide a definition of this concern. I
didn't get one, so I proposed my own: that it was unacceptable for any
mechanism to be based on pages reliably receiving and processing onclose,
onbeforeclose, keymessage or similar events at page close i.e. that the
*page* can delay shutdown. There was no dissent to this definition and the
secure release mechanism as proposed is consistent with this.

You are correct that two possible implementations of this mechanism are
either a form of secure persistent store or having the CDM write data to
disk at page close. The latter seems to be a concern for you but it is not
a concern for the other desktop browsers, as far as I know. So, I suggest
this is a Chrome-specific software architecture issue. It is certainly a
*new* and *different* issue from the one of *pages delaying shutdown*. This
is not to dismiss it, but we should be clear about the nature of the
concern.

You are correct that at attacker can cause secure release information to be
not reported to the server. This is true with or without secure persistent
store, since an attacker can simply interpose on the EME API and drop the
key release messages. This does not impact the effectiveness of the
mechanism, provided:
(1) it is not possible for an attacker to generate incorrect usage reports
(2) the frequency of (non-attack-related) missing reports is low

The mis-conception in your note is that "*even a significant lack of usage
data could be legitimate*". It is necessary that usage data is reasonably
reliable in the normal case and this can be achieved by the CDM writing to
disk on page close. It is true that there are unavoidable scenarios in
which the information is lost: browser crash or sudden loss of power but
these are rare enough that the normal case is easy to distinguish from the
suspicious case.

Regarding your conclusions:

"*Without a tamper-proof secure persistent storage available to every
implementor across every web platform client, secure release is ineffective
as a fraud detection (and prevention) mechanism, especially for general
use, which is the purpose of web specs.*"

This is incorrect: tamper-proof secure persistent store is not necessary
and there exist several implementations which prove this.

"*The alternative requires implementations to delay shutdown to ensure
playback data is written to storage and that applications implement
extensive and complex server analyses to prevent false positives.*"

There is no requirement for delayed shutdown in the sense we have discussed
before (*pages* delaying shutdown). The server analysis is relatively
simple provided the frequency of (non-attack) missing reports is low. The
existing implementations show this is achievable in practice.

"*Content providers can achieve equivalent levels of detection and
*better*** enforcement
using a relaxed renewal/heartbeat configuration without significant impact
on server load or user experience.*"

I'd be interested to hear more about this. We have not identified any such
solutions which don't either impact user experience or involve signifiant
system re-engineering of much greater complexity than the proposed client
mechanism.

"*The advantages of simpler mechanisms to users, applications, and
implementers are clear*"

I think we can agree on that ;-) The problem is that we do not agree which
mechanism is simpler, considering the whole system. We believe that
unnecessary real-time dependencies in distributed systems represent
significant complexity and are always to be avoided. In this case the
proposed dependency is dis-proportionate for a presently theoretical attack.

...Mark

On Thu, Jun 11, 2015 at 5:19 PM, David Dorwin <ddorwin@google.com> wrote:

> While investigating the latest definition of the proposed secure release
> feature, we identified a rollback attack in the absence of CDM access to **tamper-proof
> secure persistent storage**. The proposed alternative assumes CDM storage
> can be written after the application is closed, which reintroduces a form
> of delayed shutdown. While rollback attacks are a common issue handled by
> CDMs, they can often be addressed through runtime and/or server-based
> mechanisms. Secure release, however, is akin to an offline license without
> the user benefits of enabling offline playback. Offline licenses and
> playback require secure storage, which is limited to a subset of clients -
> often with higher robustness levels - and is not assumed to be widely
> available across all platforms or implementations.
>
> Therefore, we continue to believe that reliance on the server for
> concurrent stream limitations is the most sustainable way to support a
> breadth of clients and ensure a cohesive experience as platforms evolve.
> Unless there is a solution that can be equally and reliably implemented
> across the wide breadth of web platform clients, we do not believe secure
> release has a place in EME. (Not to mention the more general concerns I
> have mentioned elsewhere, such as [1].)
>
> **Impact**
>
> **Tamper-proof secure persistent storage** increases the complexity of
> implementations [2] and is currently impossible for third party CDMs,
> especially without sacrificing key security features like sandboxing. The
> two EME implementations cited as having deployed secure release at
> scale [3] a) use first-party OS-based DRM implementations and b) are tied
> to specific versions of their respective desktop OS. Lack of equivalent
> ability for third-party implementers, including smaller user agents and CDM
> vendors, puts them at a further competitive disadvantage.
>
> Furthermore, reliance on tamper-proof secure persistent storage (or
> delaying shutdown until data is persisted) is a **constraint on platform
> innovation**. Future platforms and implementations may not have
> traditional architectures or capabilities and will have to account for ways
> to support this Netflix-specific functionality. Even existing
> implementations could run into problems in the future if the user agent
> architecture changes or internals are refactored. Furthermore, there is at
> least one existing implementation that does not persist application storage
> across browsing sessions.
>
> **Preventing Rollback**
>
> With tamper-proof secure persistent CDM storage, the CDM periodically
> stores license usage and reports it when requested, either in the current
> browsing session or later. Identifying suspicious users is straightforward
> because it should be very rare to not receive more than one or two valid
> usage data reports over a period of time.
>
> However, without such secure persistent storage, an attacker may trivially
> replace the stored usage data with an older copy, including a state that
> had no recorded license usage. This is indistinguishable from the content
> never being played. Identifying suspicious users in this case is very
> complex and is highly dependent on specific client implementations
> properties, which could vary between versions of the same client. It seems
> reasonable that non-trivial numbers of playbacks might not have data
> reported on some implementations.
>
> **Why not wait until playback completes to persist the data?**
>
> The alternatives are to a) keep usage data in memory, persisting it only
> when the session is closed, or b) distinguish the final state from
> transient states by writing some sort of flag when the session is closed.
> However, this does not prevent an attacker from "rolling back" to a
> no-usage stage.  Additionally, a session can be closed or many reasons,
> including the user closing the application (e.g. tab). In those cases, CDM
> implementations must ensure that such data is written after the application
> is closed. For some CDM implementations, this may be simple because they
> run as a separate process, but others, such as those that rely on the user
> agent for storage, may require that the hosting user agent delay shutdown
> until such writes are committed.
>
> The absence of usage data could mean any of a) the keys were never used,
> b) the application was closed and the CDM could not store state, or c)
> there is abuse. Therefore, the server needs to evaluate aggregate data and
> look for suspicious patterns to detect potential fraud. Yet, even a
> significant lack of usage data could be legitimate (e.g. tab closure
> without delayed shutdown enforcement). Thus, with potentially large numbers
> of “suspicious users,” a content provider would require another mechanism
> to improve fraud detection and/or *_enforce_* concurrent stream
> limitations for such users.
>
> **Conclusion**
>
> Without a tamper-proof secure persistent storage available to every
> implementor across every web platform client, secure release is ineffective
> as a fraud detection (and prevention) mechanism, especially for general
> use, which is the purpose of web specs.
>
> The alternative requires implementations to delay shutdown to ensure
> playback data is written to storage and that applications implement
> extensive and complex server analyses to prevent false positives.
>
> Content providers can achieve equivalent levels of detection and
> **better** enforcement using a relaxed renewal/heartbeat configuration
> without significant impact on server load or user experience. (I would be
> happy to discuss this in more detail.) Those using secure release are going
> to need a server-based alternative for “suspected” accounts anyway.
>
> The advantages of simpler mechanisms to users, applications, and
> implementers are clear. As a result, I recommend that we remove secure
> release from the EME spec and focus our efforts on defining and documenting
> better alternatives.
>
>
> [1] https://github.com/w3c/encrypted-media/issues/45#issuecomment-91743387
> [2] We are not against managing increased complexity for the benefit of of
> users, if there is data that this provides users with a better user
> experience than alternatives and can be implemented on all EME-enabled
> platforms. This remains unproven.
> [3]
> https://lists.w3.org/Archives/Public/public-html-media/2015Apr/0080.html
>
Received on Friday, 12 June 2015 01:08:51 UTC