Re: [EME] Netflix’s secure release is unreliable without tamper-proof secure persistent storage and/or delayed shutdown from Joe Steele on 2015-06-16 (public-html-media@w3.org from June 2015)

From: Joe Steele <steele@adobe.com>
Date: Tue, 16 Jun 2015 23:31:32 +0000
To: David Dorwin <ddorwin@google.com>
CC: Mark Watson <watsonm@netflix.com>, "public-html-media@w3.org" <public-html-media@w3.org>, Henri Sivonen <hsivonen@mozilla.com>
Message-ID: <B2C50E24-0170-48E4-BD0D-C686F431E411@adobe.com>
There are the arguments I am hearing.

1) Requiring a “write-on-close” capability in the UA + CDM to implement this feature is not acceptable as not all platforms can support this.
Unless the platform you are talking about does not support storage at all, it is possible to support it. It may require changes to the UA and/or CDM to do so, but supporting it is a choice. I can understand why this argument is being made, but think it is a bit of a red herring. This is one way to implement the feature, but not the only way. As an example, the UA+CDM could implement a storage mechanism which included rollback prevention or rollback protection. Or the entire platform could be hardened against tampering. There may be other mechanisms. It *is* useful to point out the risks of a rollback attack to implementers, but I don’t think we should constrain the implementation.

2) Requiring a “tamper-evident” storage facility in the UA+CDM to implement this feature is not acceptable as not all platforms can support this.
From a practical point of view, we already have this requirement if the CDM supports storage at all. It does not seem useful to be able to cache licenses without being able to either detect or prevent tampering. Whether this is provided by a combination of storage support in the UA and tamper-detection in the CDM or something more elaborate is not relevant. Either mode requiring persistent storage is optional and easy to detect and compensate for in the player.

3) Using a renewal/heartbeat mechanism for managing keys is more effective.
This is a matter of opinion as Mark has pointed out. Certainly the two approaches have different scaling properties. Given that the two solutions are not equivalent in all areas of concern, I don’t think we should prefer one over the other without some overwhelming evidence. We can and should support both mechanisms.

4) There will be lots of false negatives using this mechanism.
This may be true. However the video publisher who originally asked for this feature and has been using it for many years on other platforms, says that this is a manageable issue. I don’t feel like we can just ignore that without some counter argument from other publishers who have similar experience with this mechanism. I have not heard that argument yet - but maybe you are speaking for YouTube here?

Joe

> On Jun 15, 2015, at 6:20 PM, David Dorwin <ddorwin@google.com> wrote:
> 
> Let me (re)iterate our stance:
> As you have already acknowledged, this feature, as currently proposed, does not support enforcement and an alternative mechanism is required for “suspicious” users [1].
> This feature is not reliable as a fraud detection mechanism across all user agents without accepting that it also imposes an architectural constraint (described below) on implementations’ architectures.
> A small feature of a spec that serves a narrow use case should not constrain the architecture or process model of user agents for reasons detailed below.
> The purpose of this group is to explore, discuss, and recommend the best mechanisms for the web platform, not to codify existing solutions without considering the impact or alternatives.
> 
> While this “shutdown” issue is different from the previous application shutdown issue, in general: Any feature that requires an action to be performed or data to be written when a web application is closed will be unreliable on at least some implementations. Perhaps that better covers both forms of “delayed shutdown.” If data can be periodically written during the application’s lifetime, then a missed write at the end is not significant. However, that is not the case for secure release - without tamper-proof secure storage, there can only be one write and it must succeed for the mechanism to be reliable.
> 
> A similar problem may occur when implementations or platforms reclaim resources. For example, tabs may be killed or media resources withdrawn on some platforms. In such cases, the ability to save such data may also be lost.
> 
> If we agree that
> a) tamper-proof secure persistent storage should not be required and
> b) implementation architectures should not be constrained to ensure write-on-close capability for CDMs
> Then a high frequency of (non-attack-related) missing reports is probable.
> 
> 
> You said that “having the CDM write data to disk at page close… is not a concern for the other desktop browsers” and “suggest this is a Chrome-specific software architecture issue.”
> 
> IE and Safari rely on OS-provided DRM. I would guess that runs in a separate process and is largely independent of the user agent. It sounds like Firefox has made specific accommodations for writes to occur (though Henri says not having this complication would be simpler). Those are all valid choices, but they should not be the only choices.
> 
> Other than this proposed feature, there is no reason that a CDM could not rely on the application process to handle storage or even run in the same process as the application. This is a valid implementation choice that should not be excluded. Not to mention, some CDM implementations might not even need storage if not for this feature. That is also a valid implementation choice.
> 
> Yes, Chrome’s CDM is more tightly integrated with the security and privacy features of the user agent than some other implementations, but that does not make this “a Chrome-specific software architecture issue.” It should not matter what a user agent’s or device’s architecture is - that’s the beauty of the web platform. Why should this feature of EME be an exception, setting an ill-advised precedent, especially when alternatives are already supported.
> 
> Requiring guaranteed-write-on-close is an unnecessary constraint on platform innovation and implementer flexibility. As is common in large active software projects, Chrome and its subsystems are frequently refactored to improve maintainability, performance, etc. Even if Chrome or some other user agent supports ensuring writes today, that could change in the future or when porting to other devices. Supporting this feature today constrains our ability to make such changes in the future.
> 
> 
> Finally, our concerns extend well beyond just desktop browsers. Web specs must consider emerging and to-be-invented devices and architectures where, for example, storage might not be available and/or there are entirely different process models. As I previously mentioned, there is already an implementation that does not persist application storage across browsing sessions. With secure release, this implementation would have to add a Netflix-specific workaround for origin==netflix.com <http://netflix.com/> just to get simple streaming support!
> 
> 
> David
> 
> [1] https://lists.w3.org/Archives/Public/public-html-media/2015May/0013.html <https://lists.w3.org/Archives/Public/public-html-media/2015May/0013.html>
> 
> 
> On Thu, Jun 11, 2015 at 6:08 PM, Mark Watson <watsonm@netflix.com <mailto:watsonm@netflix.com>> wrote:
> Hi David,
> 
> Thanks for this detailed response. There is one significant mis-conception below, which I will address in a moment, but I believe the heart of the issue here is what constitutes "delayed shutdown".
> 
> I had previously asked that you provide a definition of this concern. I didn't get one, so I proposed my own: that it was unacceptable for any mechanism to be based on pages reliably receiving and processing onclose, onbeforeclose, keymessage or similar events at page close i.e. that the page can delay shutdown. There was no dissent to this definition and the secure release mechanism as proposed is consistent with this.
> 
> You are correct that two possible implementations of this mechanism are either a form of secure persistent store or having the CDM write data to disk at page close. The latter seems to be a concern for you but it is not a concern for the other desktop browsers, as far as I know. So, I suggest this is a Chrome-specific software architecture issue. It is certainly a new and different issue from the one of pages delaying shutdown. This is not to dismiss it, but we should be clear about the nature of the concern.
> 
> You are correct that at attacker can cause secure release information to be not reported to the server. This is true with or without secure persistent store, since an attacker can simply interpose on the EME API and drop the key release messages. This does not impact the effectiveness of the mechanism, provided:
> (1) it is not possible for an attacker to generate incorrect usage reports
> (2) the frequency of (non-attack-related) missing reports is low
> 
> The mis-conception in your note is that "even a significant lack of usage data could be legitimate". It is necessary that usage data is reasonably reliable in the normal case and this can be achieved by the CDM writing to disk on page close. It is true that there are unavoidable scenarios in which the information is lost: browser crash or sudden loss of power but these are rare enough that the normal case is easy to distinguish from the suspicious case.
> 
> Regarding your conclusions:
> 
> "Without a tamper-proof secure persistent storage available to every implementor across every web platform client, secure release is ineffective as a fraud detection (and prevention) mechanism, especially for general use, which is the purpose of web specs."
> 
> This is incorrect: tamper-proof secure persistent store is not necessary and there exist several implementations which prove this.
> 
> "The alternative requires implementations to delay shutdown to ensure playback data is written to storage and that applications implement extensive and complex server analyses to prevent false positives."
> 
> There is no requirement for delayed shutdown in the sense we have discussed before (pages delaying shutdown). The server analysis is relatively simple provided the frequency of (non-attack) missing reports is low. The existing implementations show this is achievable in practice.
> 
> "Content providers can achieve equivalent levels of detection and *better* enforcement using a relaxed renewal/heartbeat configuration without significant impact on server load or user experience."
> 
> I'd be interested to hear more about this. We have not identified any such solutions which don't either impact user experience or involve signifiant system re-engineering of much greater complexity than the proposed client mechanism.
> 
> "The advantages of simpler mechanisms to users, applications, and implementers are clear"
> 
> I think we can agree on that ;-) The problem is that we do not agree which mechanism is simpler, considering the whole system. We believe that unnecessary real-time dependencies in distributed systems represent significant complexity and are always to be avoided. In this case the proposed dependency is dis-proportionate for a presently theoretical attack.
> 
> ...Mark
> 
> On Thu, Jun 11, 2015 at 5:19 PM, David Dorwin <ddorwin@google.com <mailto:ddorwin@google.com>> wrote:
> While investigating the latest definition of the proposed secure release feature, we identified a rollback attack in the absence of CDM access to *tamper-proof secure persistent storage*. The proposed alternative assumes CDM storage can be written after the application is closed, which reintroduces a form of delayed shutdown. While rollback attacks are a common issue handled by CDMs, they can often be addressed through runtime and/or server-based mechanisms. Secure release, however, is akin to an offline license without the user benefits of enabling offline playback. Offline licenses and playback require secure storage, which is limited to a subset of clients - often with higher robustness levels - and is not assumed to be widely available across all platforms or implementations.
> 
> Therefore, we continue to believe that reliance on the server for concurrent stream limitations is the most sustainable way to support a breadth of clients and ensure a cohesive experience as platforms evolve. Unless there is a solution that can be equally and reliably implemented across the wide breadth of web platform clients, we do not believe secure release has a place in EME. (Not to mention the more general concerns I have mentioned elsewhere, such as [1].)
> 
> *Impact*
> 
> *Tamper-proof secure persistent storage* increases the complexity of implementations [2] and is currently impossible for third party CDMs, especially without sacrificing key security features like sandboxing. The two EME implementations cited as having deployed secure release at scale [3] a) use first-party OS-based DRM implementations and b) are tied to specific versions of their respective desktop OS. Lack of equivalent ability for third-party implementers, including smaller user agents and CDM vendors, puts them at a further competitive disadvantage.
> 
> Furthermore, reliance on tamper-proof secure persistent storage (or delaying shutdown until data is persisted) is a *constraint on platform innovation*. Future platforms and implementations may not have traditional architectures or capabilities and will have to account for ways to support this Netflix-specific functionality. Even existing implementations could run into problems in the future if the user agent architecture changes or internals are refactored. Furthermore, there is at least one existing implementation that does not persist application storage across browsing sessions.
> 
> *Preventing Rollback*
> 
> With tamper-proof secure persistent CDM storage, the CDM periodically stores license usage and reports it when requested, either in the current browsing session or later. Identifying suspicious users is straightforward because it should be very rare to not receive more than one or two valid usage data reports over a period of time.
> 
> However, without such secure persistent storage, an attacker may trivially replace the stored usage data with an older copy, including a state that had no recorded license usage. This is indistinguishable from the content never being played. Identifying suspicious users in this case is very complex and is highly dependent on specific client implementations properties, which could vary between versions of the same client. It seems reasonable that non-trivial numbers of playbacks might not have data reported on some implementations.
> 
> *Why not wait until playback completes to persist the data?*
> 
> The alternatives are to a) keep usage data in memory, persisting it only when the session is closed, or b) distinguish the final state from transient states by writing some sort of flag when the session is closed. However, this does not prevent an attacker from "rolling back" to a no-usage stage.  Additionally, a session can be closed or many reasons, including the user closing the application (e.g. tab). In those cases, CDM implementations must ensure that such data is written after the application is closed. For some CDM implementations, this may be simple because they run as a separate process, but others, such as those that rely on the user agent for storage, may require that the hosting user agent delay shutdown until such writes are committed.
> 
> The absence of usage data could mean any of a) the keys were never used, b) the application was closed and the CDM could not store state, or c) there is abuse. Therefore, the server needs to evaluate aggregate data and look for suspicious patterns to detect potential fraud. Yet, even a significant lack of usage data could be legitimate (e.g. tab closure without delayed shutdown enforcement). Thus, with potentially large numbers of “suspicious users,” a content provider would require another mechanism to improve fraud detection and/or _enforce_ concurrent stream limitations for such users.
> 
> *Conclusion*
> 
> Without a tamper-proof secure persistent storage available to every implementor across every web platform client, secure release is ineffective as a fraud detection (and prevention) mechanism, especially for general use, which is the purpose of web specs.
> 
> The alternative requires implementations to delay shutdown to ensure playback data is written to storage and that applications implement extensive and complex server analyses to prevent false positives.
> 
> Content providers can achieve equivalent levels of detection and *better* enforcement using a relaxed renewal/heartbeat configuration without significant impact on server load or user experience. (I would be happy to discuss this in more detail.) Those using secure release are going to need a server-based alternative for “suspected” accounts anyway.
> 
> The advantages of simpler mechanisms to users, applications, and implementers are clear. As a result, I recommend that we remove secure release from the EME spec and focus our efforts on defining and documenting better alternatives.
> 
> 
> [1] https://github.com/w3c/encrypted-media/issues/45#issuecomment-91743387 <https://github.com/w3c/encrypted-media/issues/45#issuecomment-91743387>
> [2] We are not against managing increased complexity for the benefit of of users, if there is data that this provides users with a better user experience than alternatives and can be implemented on all EME-enabled platforms. This remains unproven.
> [3] https://lists.w3.org/Archives/Public/public-html-media/2015Apr/0080.html <https://lists.w3.org/Archives/Public/public-html-media/2015Apr/0080.html>
>
Received on Tuesday, 16 June 2015 23:32:06 UTC