Re: [EME] Netflix’s secure release is unreliable without tamper-proof secure persistent storage and/or delayed shutdown from David Dorwin on 2015-06-16 (public-html-media@w3.org from June 2015)

From: David Dorwin <ddorwin@google.com>
Date: Mon, 15 Jun 2015 18:20:21 -0700
To: Mark Watson <watsonm@netflix.com>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <CAHD2rshtbwYfwha5maWSzOt5VRYjT433yt3BJ31JratX+HBjSg@mail.gmail.com>
Let me (re)iterate our stance:

   - As you have already acknowledged, this feature, as currently proposed,
   does not support enforcement and an alternative mechanism is required for
   “suspicious” users [1].
   - This feature is not reliable as a fraud detection mechanism across all
   user agents without accepting that it also imposes an architectural
   constraint (described below) on implementations’ architectures.
   - A small feature of a spec that serves a narrow use case should not
   constrain the architecture or process model of user agents for reasons
   detailed below.
   - The purpose of this group is to explore, discuss, and recommend the
   best mechanisms for the web platform, not to codify existing solutions
   without considering the impact or alternatives.


While this “shutdown” issue is different from the previous application
shutdown issue, in general: *Any feature that requires an action to be
performed or data to be written when a web application is closed will be
unreliable on at least some implementations.* Perhaps that better covers
both forms of “delayed shutdown.” If data can be periodically written
during the application’s lifetime, then a missed write at the end is not
significant. However, that is not the case for secure release - without
tamper-proof secure storage, there can only be one write and *it must
succeed* for the mechanism to be reliable.

A similar problem may occur when implementations or platforms reclaim
resources. For example, tabs may be killed or media resources withdrawn on
some platforms. In such cases, the ability to save such data may also be
lost.

If we agree that
a) tamper-proof secure persistent storage should not be required and
b) implementation architectures should not be constrained to ensure
write-on-close capability for CDMs
Then a high frequency of (non-attack-related) missing reports is probable.


You said that “having the CDM write data to disk at page close… is not a
concern for the other desktop browsers” and “suggest this is a
Chrome-specific software architecture issue.”

IE and Safari rely on OS-provided DRM. I would guess that runs in a
separate process and is largely independent of the user agent. It sounds
like Firefox has made specific accommodations for writes to occur (though
Henri says not having this complication would be simpler). Those are all
valid choices, but *they should not be the only choices.*

Other than this proposed feature, there is no reason that a CDM could not
rely on the application process to handle storage or even run in the same
process as the application. This is a valid implementation choice that
should not be excluded. Not to mention, *some CDM implementations might not
even need storage if not for this feature.* That is also a valid
implementation choice.

Yes, Chrome’s CDM is more tightly integrated with the security and privacy
features of the user agent than some other implementations, but that does
not make this “a Chrome-specific software architecture issue.” It should
not matter what a user agent’s or device’s architecture is - that’s the
beauty of the web platform. Why should this feature of EME be an exception,
setting an ill-advised precedent, especially when alternatives are already
supported.

Requiring guaranteed-write-on-close is an unnecessary constraint on
platform innovation and implementer flexibility. As is common in large
active software projects, Chrome and its subsystems are frequently
refactored to improve maintainability, performance, etc. Even if Chrome or
some other user agent supports ensuring writes today, that could change in
the future or when porting to other devices. Supporting this feature today
constrains our ability to make such changes in the future.


Finally, our concerns extend well beyond just desktop browsers. Web specs
must consider emerging and to-be-invented devices and architectures where,
for example, storage might not be available and/or there are entirely
different process models. As I previously mentioned, there is already an
implementation that does not persist application storage across browsing
sessions. With secure release, this implementation would have to *add a
Netflix-specific workaround for origin==netflix.com <http://netflix.com>
just to get simple streaming support!*


David

[1] https://lists.w3.org/Archives/Public/public-html-media/2015May/0013.html


On Thu, Jun 11, 2015 at 6:08 PM, Mark Watson <watsonm@netflix.com> wrote:

> Hi David,
>
> Thanks for this detailed response. There is one significant mis-conception
> below, which I will address in a moment, but I believe the heart of the
> issue here is what constitutes "delayed shutdown".
>
> I had previously asked that you provide a definition of this concern. I
> didn't get one, so I proposed my own: that it was unacceptable for any
> mechanism to be based on pages reliably receiving and processing onclose,
> onbeforeclose, keymessage or similar events at page close i.e. that the
> *page* can delay shutdown. There was no dissent to this definition and
> the secure release mechanism as proposed is consistent with this.
>
> You are correct that two possible implementations of this mechanism are
> either a form of secure persistent store or having the CDM write data to
> disk at page close. The latter seems to be a concern for you but it is not
> a concern for the other desktop browsers, as far as I know. So, I suggest
> this is a Chrome-specific software architecture issue. It is certainly a
> *new* and *different* issue from the one of *pages delaying shutdown*.
> This is not to dismiss it, but we should be clear about the nature of the
> concern.
>
> You are correct that at attacker can cause secure release information to
> be not reported to the server. This is true with or without secure
> persistent store, since an attacker can simply interpose on the EME API and
> drop the key release messages. This does not impact the effectiveness of
> the mechanism, provided:
> (1) it is not possible for an attacker to generate incorrect usage reports
> (2) the frequency of (non-attack-related) missing reports is low
>
> The mis-conception in your note is that "*even a significant lack of
> usage data could be legitimate*". It is necessary that usage data is
> reasonably reliable in the normal case and this can be achieved by the CDM
> writing to disk on page close. It is true that there are unavoidable
> scenarios in which the information is lost: browser crash or sudden loss of
> power but these are rare enough that the normal case is easy to distinguish
> from the suspicious case.
>
> Regarding your conclusions:
>
> "*Without a tamper-proof secure persistent storage available to every
> implementor across every web platform client, secure release is ineffective
> as a fraud detection (and prevention) mechanism, especially for general
> use, which is the purpose of web specs.*"
>
> This is incorrect: tamper-proof secure persistent store is not necessary
> and there exist several implementations which prove this.
>
> "*The alternative requires implementations to delay shutdown to ensure
> playback data is written to storage and that applications implement
> extensive and complex server analyses to prevent false positives.*"
>
> There is no requirement for delayed shutdown in the sense we have
> discussed before (*pages* delaying shutdown). The server analysis is
> relatively simple provided the frequency of (non-attack) missing reports is
> low. The existing implementations show this is achievable in practice.
>
> "*Content providers can achieve equivalent levels of detection
> and *better*** enforcement using a relaxed renewal/heartbeat
> configuration without significant impact on server load or user experience.*
> "
>
> I'd be interested to hear more about this. We have not identified any such
> solutions which don't either impact user experience or involve signifiant
> system re-engineering of much greater complexity than the proposed client
> mechanism.
>
> "*The advantages of simpler mechanisms to users, applications, and
> implementers are clear*"
>
> I think we can agree on that ;-) The problem is that we do not agree which
> mechanism is simpler, considering the whole system. We believe that
> unnecessary real-time dependencies in distributed systems represent
> significant complexity and are always to be avoided. In this case the
> proposed dependency is dis-proportionate for a presently theoretical attack.
>
> ...Mark
>
> On Thu, Jun 11, 2015 at 5:19 PM, David Dorwin <ddorwin@google.com> wrote:
>
>> While investigating the latest definition of the proposed secure release
>> feature, we identified a rollback attack in the absence of CDM access to **tamper-proof
>> secure persistent storage**. The proposed alternative assumes CDM
>> storage can be written after the application is closed, which reintroduces
>> a form of delayed shutdown. While rollback attacks are a common issue
>> handled by CDMs, they can often be addressed through runtime and/or
>> server-based mechanisms. Secure release, however, is akin to an offline
>> license without the user benefits of enabling offline playback. Offline
>> licenses and playback require secure storage, which is limited to a subset
>> of clients - often with higher robustness levels - and is not assumed to be
>> widely available across all platforms or implementations.
>>
>> Therefore, we continue to believe that reliance on the server for
>> concurrent stream limitations is the most sustainable way to support a
>> breadth of clients and ensure a cohesive experience as platforms evolve.
>> Unless there is a solution that can be equally and reliably implemented
>> across the wide breadth of web platform clients, we do not believe secure
>> release has a place in EME. (Not to mention the more general concerns I
>> have mentioned elsewhere, such as [1].)
>>
>> **Impact**
>>
>> **Tamper-proof secure persistent storage** increases the complexity of
>> implementations [2] and is currently impossible for third party CDMs,
>> especially without sacrificing key security features like sandboxing. The
>> two EME implementations cited as having deployed secure release at
>> scale [3] a) use first-party OS-based DRM implementations and b) are tied
>> to specific versions of their respective desktop OS. Lack of equivalent
>> ability for third-party implementers, including smaller user agents and CDM
>> vendors, puts them at a further competitive disadvantage.
>>
>> Furthermore, reliance on tamper-proof secure persistent storage (or
>> delaying shutdown until data is persisted) is a **constraint on platform
>> innovation**. Future platforms and implementations may not have
>> traditional architectures or capabilities and will have to account for ways
>> to support this Netflix-specific functionality. Even existing
>> implementations could run into problems in the future if the user agent
>> architecture changes or internals are refactored. Furthermore, there is at
>> least one existing implementation that does not persist application storage
>> across browsing sessions.
>>
>> **Preventing Rollback**
>>
>> With tamper-proof secure persistent CDM storage, the CDM periodically
>> stores license usage and reports it when requested, either in the current
>> browsing session or later. Identifying suspicious users is straightforward
>> because it should be very rare to not receive more than one or two valid
>> usage data reports over a period of time.
>>
>> However, without such secure persistent storage, an attacker may
>> trivially replace the stored usage data with an older copy, including a
>> state that had no recorded license usage. This is indistinguishable from
>> the content never being played. Identifying suspicious users in this case
>> is very complex and is highly dependent on specific client implementations
>> properties, which could vary between versions of the same client. It seems
>> reasonable that non-trivial numbers of playbacks might not have data
>> reported on some implementations.
>>
>> **Why not wait until playback completes to persist the data?**
>>
>> The alternatives are to a) keep usage data in memory, persisting it only
>> when the session is closed, or b) distinguish the final state from
>> transient states by writing some sort of flag when the session is closed.
>> However, this does not prevent an attacker from "rolling back" to a
>> no-usage stage.  Additionally, a session can be closed or many reasons,
>> including the user closing the application (e.g. tab). In those cases, CDM
>> implementations must ensure that such data is written after the application
>> is closed. For some CDM implementations, this may be simple because they
>> run as a separate process, but others, such as those that rely on the user
>> agent for storage, may require that the hosting user agent delay shutdown
>> until such writes are committed.
>>
>> The absence of usage data could mean any of a) the keys were never used,
>> b) the application was closed and the CDM could not store state, or c)
>> there is abuse. Therefore, the server needs to evaluate aggregate data and
>> look for suspicious patterns to detect potential fraud. Yet, even a
>> significant lack of usage data could be legitimate (e.g. tab closure
>> without delayed shutdown enforcement). Thus, with potentially large numbers
>> of “suspicious users,” a content provider would require another mechanism
>> to improve fraud detection and/or *_enforce_* concurrent stream
>> limitations for such users.
>>
>> **Conclusion**
>>
>> Without a tamper-proof secure persistent storage available to every
>> implementor across every web platform client, secure release is ineffective
>> as a fraud detection (and prevention) mechanism, especially for general
>> use, which is the purpose of web specs.
>>
>> The alternative requires implementations to delay shutdown to ensure
>> playback data is written to storage and that applications implement
>> extensive and complex server analyses to prevent false positives.
>>
>> Content providers can achieve equivalent levels of detection and
>> **better** enforcement using a relaxed renewal/heartbeat configuration
>> without significant impact on server load or user experience. (I would be
>> happy to discuss this in more detail.) Those using secure release are going
>> to need a server-based alternative for “suspected” accounts anyway.
>>
>> The advantages of simpler mechanisms to users, applications, and
>> implementers are clear. As a result, I recommend that we remove secure
>> release from the EME spec and focus our efforts on defining and documenting
>> better alternatives.
>>
>>
>> [1]
>> https://github.com/w3c/encrypted-media/issues/45#issuecomment-91743387
>> [2] We are not against managing increased complexity for the benefit of
>> of users, if there is data that this provides users with a better user
>> experience than alternatives and can be implemented on all EME-enabled
>> platforms. This remains unproven.
>> [3]
>> https://lists.w3.org/Archives/Public/public-html-media/2015Apr/0080.html
>>
>
>
Received on Tuesday, 16 June 2015 01:21:13 UTC