Re: Recurring License Renewals for Concurrency Detection and Enforcement

Note: I’ve switched from using “secure release” to “tracked” since that is
the actual proposed term and more accurately describes the feature.

On Mon, Aug 17, 2015 at 5:39 PM, Mark Watson <watsonm@netflix.com> wrote:

>
>
> On Mon, Aug 17, 2015 at 4:24 PM, David Dorwin <ddorwin@google.com> wrote:
>
>> Mark,
>>
>> Below, I have responded to your comments regarding renewal. I’ll start a
>> separate thread proposing next steps on secure release.
>>
>> We don't understand your point about only capturing large scale account
>> sharing. Specifically, there should be no forgeable "end of session"
>> message, just as there should be no forgeable secure release message. All
>> such messages should come from the CDM. Even so, it should be possible to
>> detect such a pattern of abnormal session durations even among an
>> individual user.
>>
>
> ​It sounds like you are assume a secure release mechanism which works in
> the graceful close case, but with no persistence. That's new, but
> interesting.
>

No, that's not what I was saying. My point is that renewal does not rely on
any message that is forgeable. To be clear, the renewal description does
not use or rely on an “end of session” message. While a best-effort message
in graceful/application-controlled end-of-playback cases could be added (as
is the case for “tracked” sessions), we have not discussed this for
simplicity and because it is unclear how common the graceful case is and
how much value there would be in implementing such an optimization. We
could explore if there is interest.


> First, let's assume we do not have such a mechanism and consider the case
> of a single user exceeding concurrent stream limits. Perhaps they have a
> 2-stream plan but are using up to 4 streams at a time. The is achieved by
> forging an end-of-session indication each time they wish to start a 3rd or
> 4th concurrent stream. For this one user the distribution of session
> durations will be different than it would be *for the same user* were
> they obeying the stream limits but it would not be at all unusual within
> the spread of stream duration distributions that are seen across all users.
>
> Now, suppose we have a secure release mechanism (without persistence).
> This makes convincing the server that a session has ended a little harder:
> since an explicit end-of-session cannot be forged the attack must rely on
> an implicit one, which in our system is a sequence of missed heartbeats.
> This does make the attack slightly less convenient, since it takes a while.
> Again, the fake (implicit) session end indication is used only when
> necessary to free up a stream so there would not be a reliably detectable
> difference in session end type patterns that would single out this user as
> suspicious. [For example, some users may be in the habit watching a show as
> the last thing they do each day and closing their laptop lid when the
> credits roll.]
>

Are you positing that that secure release/”tracked” sessions without
persistence is ineffective? FWIW, I do think that finding a robust solution
that does not require persistence, or at least write-on-close, will
probably satisfy everyone. However, we have never pushed for “secure
release” without persistence. *License renewal is a different mechanism.*


> Furthermore, in some non-graceful close cases (e.g. browser tab close)
> where a CDM message is not available, it is possible for the application to
> send an end-of-session message to the server - possibly on a
> fire-and-forget basis.
>

I expect the success rate of such a message to be low, at least on some
browsers, and trending downwards - with the possible exception of using
navigator.sendBeacon(). That said, if this is important for a good user
experience with “tracked” sessions, it is important to get data on the
success rate to determine whether this is reliable in various browser
implementations. As the only provider known to have implemented this
solution, do you have such data?


> This may also be used to free a concurrent stream at the server and this
> is important to avoid the user being locked out of their legitimate
> concurrent streams when a close event like this occurs. Of course, such
> messages can be forged.
>

One of the advantages of renewal is that it is easy to always permit new
streaming requests while terminating and/or not-renewing existing licenses.
As a result, renewal avoids penalizing the legitimate user without relying
on a stochastic forgeable message.

In the “tracked” case, what happens if the best-effort message is not
received by the server in an event like this? Is the user locked out of
their legitimate concurrent streams? We have not heard anything about
enforcement or the ability to clamp down on suspected users without
adversely impacting legitimate usage like renewal allows.


> As a result, we still don't see a way that license renewal can support our
> use-case without significant changes in system architecture to support the
> required reliability.
>

Given that your response was not related to renewals, I think it would be
inappropriate to conclude at this time that renewals does not support
Netflix's use case.

It would probably help to be explicit about the important points of
Netflix's use case. (The same goes for other authors or participants
indicating that customers are asking for such a feature.)

Below, I started a list based on my understanding of what you have said.
Please update as required.

   - Does not require using “well-known techniques for implementing highly
   available services over an unreliable network.”
   - Preferably uses only unforgeable messages.
   - May use forgeable messages.
         - (Netflix uses application-based heartbeats, but this is not
         necessarily a requirement.)
      - HOWEVER, it must be possible to detect potentially-forged messages.
   - Should “avoid the user being locked out of their legitimate concurrent
   streams.”


I assume enforcement is also in there, but it has not really been discussed
in relation to “tracked” sessions. When and how should enforcement be
applied? I assume the server needs to make some judgements as to whether
missing unforgeable release messages are likely due to legitimate reasons
or fraud. How are the edge/gray area cases handled to catch fraud while not
harming legitimate users?

I think a document, similar to the one for renewals that started this
thread, would be useful for this group and for other authors considering
using “tracked” sessions to understand how the “record of key usage times”
provided by "tracked" sessions is intended to be used. Specific topics
include what times are expected; other assumed mechanisms, such as
heartbeats and fire-and-forget end-of-session messages; how to handle
unexpected values or missing records; differentiating abuse/fraud from
legitimate use cases; enforcement and/or clamping down to prevent
abuse/fraud; and what happens in various use cases, such as non-graceful
closes.


>
>
>> *Renewals can work well with no additional server-side architecture* and
>> without the need for heuristics. As explained in the document, with no
>> additional work, most users can be in a fairly lenient bucket that still
>> offers enforcement.
>>
>> The successful use of secure release appears to depend on server-side
>> heuristics-based decisions to identify clients on which enforcement
>> mechanisms should be applied. For services willing to use heuristics (as
>> you previously indicated Netflix is), the renewal solution allows most
>> users to be put in an even more lenient bucket and thus unaffected by
>> transient server outages, preserving a strong user experience. So, I
>> disagree that there are *large* additional complexities in server-side
>> architecture required to successfully use renewals vs. secure release.
>>
>
> ​The difference in complexity is in relation to the reliability techniques
> you described.
>
> In the persisted secure release ​method, the rule is actually very
> reliable: First, if session durations according to secure release do not
> match what apparently happened at the time this is a very strong indication
> of a problem Second, we expect a very low number of missing secure release
> records, so we do not expect significant numbers of false positives from
> detecting suppressed secure releases. This doesn't really count as a
> "heuristic" wheras the differential application of license renewal policies
> very much does.
>

As I mentioned above, it's unclear how this works. For example, how do you
handle delayed records (assuming you eventually receive most)? How are
non-zero number of false-positives handled to avoid user impact?

If "tracked" is to be included in the eventual spec, we should also
document these facets clearly to ensure that anyone considering using or
implementing this has all the information they need.


> ...Mark
>
>
>
>>
>> The “well-known techniques” can be used a) as an alternative to increase
>> service reliability without the need for heuristics and the associated
>> buckets, which may be undesirable for some services, or b) to provide even
>> higher reliability. Such techniques also provide additional benefits for
>> services, including increased server availability for initial license
>> requests. Client-side solutions offer no additional benefit to the service
>> or user. As I said above, though, renewal can work well even without such
>> additional architecture.
>>
>> David
>>
>>
>> On Wed, Aug 12, 2015 at 4:35 PM, Mark Watson <watsonm@netflix.com> wrote:
>>
>>> David, all,
>>>
>>> The paper explains well the license renewal mechanism and provides ample
>>> justification for its inclusion in the specification. However, it doesn't
>>> provide any justification for the removal of secure release.
>>>
>>> There are two main topics to the paper: "Adaptability" and "Resilient,
>>> Independent license servers" to which my comments are as follows:
>>>
>>> *Adaptability*
>>>
>>> This presents the useful idea of adapting the "stringency" of concurrent
>>> stream controls (specifically license expiry time) based on per-user
>>> signals from the system. This is intended to restrict the UX pain caused by
>>> such stringent controls to a small subset of users. This UX pain is one of
>>> the primary reasons we at Netflix prefer not to use this mechanism, so such
>>> a mitigation would indeed be a useful.
>>>
>>> However, users can only be identified through heuristics and such
>>> heuristics can capture only the case of large scale account sharing /
>>> stream limit evasion and not the case of an individual user evading stream
>>> limits.
>>>
>>> [Specifically, one attack on concurrent stream limits involves
>>> suppressing heartbeats / renewals and forging an early "end of session"
>>> message to the server. The server sees only a short session with the
>>> correct number of renewals. A large scale attack like this would be
>>> captured by a heuristic based on an abnormal distribution of session
>>> durations, but for a single user evading concurrent stream restrictions, no
>>> such abnormality would be visible.]
>>>
>>> *Resilient, Independent license servers*
>>>
>>> This section describes a number of well-known techniques for
>>> implementing highly available services over an unreliable network and
>>> details the application of those techniques to this problem. It was never
>>> in dispute that techniques exist to improve service reliability. What is in
>>> dispute is whether this additional architectural complexity is justified
>>> and proportionate for this problem. We continue to think it is not.
>>>
>>> Whilst the additional ideas in both sections are indeed useful, we
>>> continue to believe that secure release is a less complex and proportionate
>>> approach to this problem. This is very much a service-provided-specific
>>> decision and other providers may feel that the benefits of license renewal
>>> outlined in the paper justify the complexity. As a result we believe the
>>> specification should support both, allowing site authors to choose for
>>> themselves an appropriate balance between complexity, detection vs
>>> enforcement, stringency of control and platform reach.
>>>
>>> ...Mark
>>>
>>>
>>>
>>> On Fri, Jul 31, 2015 at 11:56 AM, Paul Cotton <Paul.Cotton@microsoft.com
>>> > wrote:
>>>
>>>> To ensure we have an archived version of the this paper I have included
>>>> the text below and am attaching a PDF version.
>>>>
>>>>
>>>>
>>>> /paulc
>>>>
>>>>
>>>>
>>>> Paul Cotton, Microsoft Canada
>>>>
>>>> 17 Eleanor Drive, Ottawa, Ontario K2E 6A3
>>>>
>>>> Tel: (425) 705-9596 Fax: (425) 936-7329
>>>>
>>>>
>>>>
>>>> Recurring License Renewals for Concurrency Detection and Enforcement
>>>>
>>>>
>>>>
>>>> Introduction
>>>>
>>>> Streaming Over-The-Top video delivery services often need to restrict
>>>> the number of concurrent playbacks per user account. This document
>>>> describes the use of license renewals (aka “heartbeats” aka recurring
>>>> license updates) as a mechanism for detecting and enforcing such streaming
>>>> concurrency limits.
>>>>
>>>>
>>>>
>>>> Recurring License Renewals is a feature implemented server-side and
>>>> supports both continuous detection and enforcement. This solution provides
>>>> the following benefits:
>>>>
>>>>    - A great user experience as it does not impact well-behaved users
>>>>    (most users using OTT video delivery services) and allows placing
>>>>    restrictions on users displaying suspicious behaviors quickly.
>>>>    - Scales effectively across types and number of clients as it is
>>>>    independent of client architecture, does not require any type of storage in
>>>>    clients and could be made to work effectively in private browsing modes and
>>>>    on stateless devices.
>>>>    - Provides flexibility for authors to architect their service as
>>>>    threats evolve.
>>>>    - Effectively addresses detection and enforcement with a single
>>>>    solution.
>>>>    - Is consistent with the principles of the Web Platform.
>>>>
>>>>
>>>>
>>>> Below, we provide details on how recurring license renewal works,
>>>> describe the flexibility content providers have in implementing policies,
>>>> explain how server outages can be handled along with an example flow, and
>>>> highlight benefits of this solution, especially for scalable internet-based
>>>> video on demand services.
>>>>
>>>>
>>>>
>>>> How Recurring License Renewal Works
>>>>
>>>> When a media license is requested by a DRM client, the license server
>>>> responds with a limited-duration license. The license usually has a
>>>> duration on the order of a few minutes and is configured with a policy that
>>>> allows for license renewals. Well before the license expires, the DRM
>>>> client issues a license renewal request (aka heartbeat request), which may
>>>> be repeated if a valid renewal is not received after some configured
>>>> timeout. When the DRM server services the license renewal request, it may
>>>> choose to extend the license for another license renewal period or to
>>>> discontinue key usage rights - by means of a zero-duration license or
>>>> similar. When the DRM client receives the license renewal, it extends the
>>>> license by the prescribed amount, or revokes access to the keys, thus
>>>> terminating playback.
>>>>
>>>> *Detecting concurrency*
>>>>
>>>> The DRM servers can easily determine how many concurrent devices are
>>>> accessing a particular piece of content - or the service in general - by
>>>> counting the renewals requests that were received in the last renewal
>>>> period for a user and/or piece of content. If the DRM servers have not
>>>> heard from a DRM client instance within the last renewal period, they may
>>>> assume that that device is no longer playing back the content because
>>>> either the user has discontinued playback or its key usage rights have been
>>>> revoked due to license expiration.
>>>>
>>>>
>>>>
>>>> The implementation on the servers can be rather trivial. The servers
>>>> servicing license renewals can keep a queue of recorded heartbeats in which
>>>> old entries expire after a time equal to the license duration. The main DRM
>>>> server can decide whether to honor a new initial license request by
>>>> checking the license-renewal server queue to determine concurrent ongoing
>>>> playback sessions. Alternatively the main DRM servers can always service
>>>> new requests, while instructing the renewal servers to discontinue license
>>>> rights for the oldest session next time a license renewal request is
>>>> received for said session. This has the advantage of always granting the
>>>> most recent user request (vs. denying a new request because other playbacks
>>>> are believed to be in progress). Alternatively, a detection-only mode can
>>>> be implemented by having both the initial license request and renewal
>>>> servers always service license requests and just logging the request.
>>>>
>>>>
>>>>
>>>> *Adaptability*
>>>>
>>>> The license period can be adjusted dynamically and per
>>>> account/device/title based on any parameters the content provider chooses.
>>>> Such adjustments might include increased periods during times of server
>>>> overload, data center outage, etc.; increased periods for “well-behaved”
>>>> accounts; decreased periods, different policies, and/or additional scrutiny
>>>> logic for “suspicious” accounts or behavior; and optimistically granting a
>>>> license or renewal in abnormal situations for “well-behaved” accounts.
>>>>
>>>>
>>>>
>>>> Both the renewal request interval and the license duration are
>>>> configurable. These intervals can be varied to achieve various goals.
>>>>
>>>>
>>>>
>>>> The following example illustrates a model providing three levels of
>>>> enforcement, with all users starting in detection-only mode:
>>>>
>>>>    1. Detection-only mode:
>>>>       - renewal interval = 5 minutes, license duration = content
>>>>       duration + x minutes (for pausing, etc)
>>>>       - These settings offer no reduction in reliability vs. other
>>>>       mechanisms, such as secure release, due to license server outages.
>>>>       - If the account generates more than a certain number of new
>>>>       license requests (across multiple devices) and no renewals in specific
>>>>       period of time, enable lightweight enforcement due to possible suppression
>>>>       of renewal requests.
>>>>    2. Lightweight enforcement mode:
>>>>       - renewal interval = 5 minutes, license duration = content
>>>>       duration / 2
>>>>       - If the user is really suppressing renewal requests, it now
>>>>       becomes impossible to watch any piece of content in one shot.
>>>>       - If the account consistently generates two or more new license
>>>>       requests for the same content on the same device, or there is a suspicious
>>>>       renewal request (time since last renewal is close to the license duration
>>>>       instead of 5 minutes) move the account to strict enforcement, as this
>>>>       suggests the user is suppressing renewal requests and simply reloading
>>>>       halfway through the content (when the license expires).
>>>>       - Accounts in this state are only slightly more impacted by
>>>>       server outages as only an extended outage halts playback.
>>>>    3. Strict enforcement mode:
>>>>       - renewal interval = 3 minutes, license period = 5 minutes.
>>>>       - The user cannot watch more than 5 minutes of content at a time
>>>>       if suppressing renewal requests.
>>>>       - In this mode the user may be affected by server outages of
>>>>       more than 2 minutes.
>>>>       - After some period of time (a week, etc), the user may be
>>>>       dropped down to more relaxed modes.
>>>>
>>>>
>>>>
>>>> Regardless of the heuristics used, the user experience for accounts
>>>> under more relaxed enforcement is not impacted, and the impact on server
>>>> load is negligible because most users fall in this category. For
>>>> non-abusing users, being in a more restrictive mode only negatively affects
>>>> the user if there is a major license server outage.
>>>>
>>>>
>>>>
>>>> *Resilient, Independent License Servers*
>>>>
>>>> While a simple implementation of  license renewals may require that
>>>> servers and clients have shared knowledge of session keys used to
>>>> authenticate messages, it is possible to instead compute such keys such
>>>> that both the license server servicing an initial license request and the
>>>> server servicing a license renewal request can derive the same session keys
>>>> and authenticate and sign messages without having to exchange any
>>>> information between them. Thus, even if a license server is unable to
>>>> access the infrastructure required to service an initial license request,
>>>> it can still service license renewal requests. In addition, the license
>>>> servers can be designed to always renew licenses in an emergency, such as
>>>> when the backend infrastructure becomes unavailable, rendering them unable
>>>> to check concurrency.
>>>>
>>>>
>>>>
>>>> In such an implementation, as long as the license servers are running
>>>> and reachable, users’ existing playbacks will not be interrupted.
>>>> Resiliency can be further increased by adding license server redundancy and
>>>> including multiple URLs in the application for handling license renewal
>>>> messages.
>>>>
>>>>
>>>>
>>>> Reliability can be further increased by deploying renewal servers
>>>> across geographies and infrastructures. This maximizes the chances that
>>>> license renewal requests will be serviced should central servers or the
>>>> original data center become unreachable.
>>>>
>>>>
>>>>
>>>> License renewals are usually “lighter” than initial licenses, as they
>>>> only contain some policy data indicating to extend or revoke usage of the
>>>> media keys, and not the actual media keys. This means that renewals require
>>>> fewer resources to service than initial licenses. Renewal servers can also
>>>> be detached from the key server/store as well as, for example, the main
>>>> user account database. Therefore the license renewal servers may be
>>>> different from the those servicing the initial license request.
>>>>
>>>>
>>>>
>>>> *When a server goes down…*
>>>>
>>>> The following is an example of how a content provider may choose to
>>>> implement a very reliable concurrency-enforcing system using distributed
>>>> redundancy. It is relevant mostly for users which are in strict enforcement
>>>> modes, as users in detection-only or lightweight enforcement modes are not
>>>> adversely affected by server outages. Other designs are also possible,
>>>> including less complex designs or ones that only detect concurrency.
>>>>
>>>>
>>>>
>>>> When the license server servicing an initial license request becomes
>>>> unavailable:
>>>>
>>>>    - Initial license requests from the application will fail (i.e. 503
>>>>    or some other error).
>>>>    - Seeing a network failure for a “license-request” message, the
>>>>    application requests the license using the URL for a fallback license
>>>>    server.
>>>>       - The application can have any number of fallback URLs,
>>>>       including ones for different clusters or a data center across the country
>>>>       or in another country or continent.
>>>>    - The application keeps trying additional fallback URLs until one
>>>>    succeeds.
>>>>
>>>>
>>>>
>>>> When the license server servicing a renewal becomes unavailable:
>>>>
>>>>    - Renewal requests from the application will fail (i.e. 503 or some
>>>>    other error).
>>>>    - Playback will continue for the duration of the license, which is
>>>>    likely a few more minutes.
>>>>    - Seeing the network failure for a "license-renewal" message, the
>>>>    application requests a renewal using the URL for a fallback license server.
>>>>       - The application can have any number of fallback URLs,
>>>>       including ones for different clusters or a data center across the country
>>>>       or in another country or continent.
>>>>    - The application keeps trying additional fallback URLs until one
>>>>    succeeds.
>>>>       - If any URL is reachable, the application has more than enough
>>>>       time to attempt to connect to a number of URLs before the license expires.
>>>>
>>>>
>>>>
>>>> The license servers employ the following logic when servicing initial
>>>> license requests:
>>>>
>>>>    - The license server accesses the user database and content key
>>>>    store to fulfill the license request.
>>>>       - If the license server backend is unavailable, then the server
>>>>       rejects the license request (i.e. 500, 503, or some other error).
>>>>    - The license server checks the record of heartbeats or concurrent
>>>>    usage.
>>>>       - If the concurrency infrastructure is not available, the server
>>>>       may choose to service the request, or fail (i.e. 500, 503, or some other
>>>>       error).
>>>>    - Based on the concurrency information, the license server may
>>>>    reject or service the license request. Alternatively it may always service
>>>>    new license requests, while flagging an older license for discontinuation.
>>>>
>>>>
>>>>
>>>> The license servers employ the following logic when servicing license
>>>> renewal requests. Note: Any license server can service any renewal request.
>>>>
>>>>    - The license server checks some record of heartbeats or concurrent
>>>>    usage.
>>>>       - If that record is inaccessible for some reason (i.e. that
>>>>       server is down), the server determines whether it should enter an emergency
>>>>       renewal mode.
>>>>          - For example, repeated attempts to reach a backend component
>>>>          fail or some other “circuit breaker” is tripped.
>>>>          - Since this is a rare event, such detection would be logged,
>>>>          trigger alerts, etc.
>>>>       - If in emergency renewal mode, the renewal server will renew
>>>>       all valid licenses.
>>>>    - If the session is flagged for discontinuation, the server denies
>>>>    the license renewal request.
>>>>    - If desired (for example in emergency renewal mode or due to
>>>>    server overload), the renewal specifies an increased license duration.
>>>>
>>>>
>>>>
>>>> *Benefits of License Renewal*
>>>>
>>>> The following are some of the benefits of of license renewal.
>>>>
>>>>    - Scales effectively across any number and type of clients.
>>>>       - Can be supported in any implementation that supports the
>>>>       simple streaming case. That is, all user agent implementations that support
>>>>       EME.
>>>>       - Does not constrain client implementation or device
>>>>       architecture. If a client can request and handle licenses, it can support
>>>>       license renewals!
>>>>       - Does not require any type of persistent storage, even JS
>>>>       storage, on the client.
>>>>       - Can work in private browsing modes (assuming other privacy
>>>>       issues are addressed by the implementation) and on stateless devices.
>>>>    - The lack of a renewal request before expiration is the “proof”
>>>>    that the content is no longer being played. Unlike, for example, secure
>>>>    release, there is no chance of lost proofs due to laptop lid closure,
>>>>    crashes, etc.
>>>>    - License renewals place the complexity of the solution on the DRM
>>>>    servers, where the complexity (little as it is) needs to be addressed just
>>>>    once.
>>>>    - Does not require tracking session history across days or weeks to
>>>>    correlate delayed or lost release messages.
>>>>    - Flexibility for authors: As with the rest of the web platform, a
>>>>    server-based solution can be updated, tweaked, experimented with, and
>>>>    adapted to emerging use cases or threats without having to update every
>>>>    client implementation.
>>>>    - Simplicity for authors: A single solution for enforcement of
>>>>    concurrent license limits.
>>>>       - Use of secure release still requires such a solution for
>>>>       enforcement for suspicious users.
>>>>       - License renewals is a single solution that can be tweaked per
>>>>       account, if desired, and provide direct feedback about the impact of those
>>>>       tweaks.
>>>>       - Avoids the risks of having a second rarely-used path or
>>>>       switching users from one to the other.
>>>>    - Provides near real-time usage metrics and enforcement.
>>>>    - Requires no explicit changes to or text in the EME spec, though
>>>>    we don’t oppose explicitly covering it in the spec if desired by the group.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* David Dorwin [mailto:ddorwin@google.com]
>>>> *Sent:* Friday, July 31, 2015 1:41 PM
>>>> *To:* public-html-media@w3.org
>>>> *Subject:* Recurring License Renewals for Concurrency Detection and
>>>> Enforcement
>>>>
>>>>
>>>>
>>>> As promised, here is a description of how renewal can be reliably used
>>>> for detection and enforcement of concurrency limits:
>>>>
>>>>
>>>> https://docs.google.com/document/d/148gkH34jcz5mPa9d2g6DVBG_eFAsMHh7zSmakO5VDMY/edit
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>
>>>
>>
>

Received on Friday, 21 August 2015 22:41:11 UTC