Re: [EME] Switching decoders when the key system is specified from Mark Watson on 2012-10-05 (public-html-media@w3.org from October 2012)

From: Mark Watson <watsonm@netflix.com>
Date: Fri, 5 Oct 2012 00:34:55 +0000
To: David Dorwin <ddorwin@google.com>
CC: Aaron Colwell <acolwell@google.com>, Steven Robertson <strobe@google.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <56018905-A847-4312-B54F-B8E2370CF045@netflix.com>
On Oct 4, 2012, at 4:57 PM, David Dorwin wrote:

I don't think MSE Media Segments are particularly relevant to the problem we need to solve. As Mark mentioned, we need to support transitions from unencrypted to encrypted within Media Segments, which is essentially the same as supporting transitions in streams when not using MSE. However, If the key system was set during an unencrypted Media Segment, this wouldn't be an issue when we switched to a potentially encrypted Media Segment because the MSE segment initialization would select the proper codec and CDM. (This is what option #2 would give us, but that is far too limiting.)


The use case we need to address is as follows. Within a single source/stream/Media Segment (likely one that is detected as "potentially encrypted"):

  1.  No key system has been previously selected.
  2.  The first block/frame(s) are unencrypted.
  3.  Decoding and/or playback starts.
  4.  The key system is selected.
  5.  An encrypted block/frame is encountered.

To accomplish uninterrupted playback the media stack must seemlessly transition to the selected key system's decoder by #5. The question is, when should it do so and should/how can we guarantee it can do so smoothly?

If the frame at #5 depends on previous frames, this will likely be difficult. If the "first encrypted frame" (#5) does not depend on prior frames, we can transition decoders when the first encrypted frame is encountered after the key system is specified. However, seeking means that any encrypted frame in the stream might be the "first encrypted frame", so no encrypted frames may depend on prior unencrypted frames.

We might make a more general statement like "When a key system is specified, it may not become active until the next GOP." The "may" gives some wiggle room, allowing CDMs that only decrypt to become active immediately and for some implementations to switch mid-GOP. However, it (possibly along with some non-normative guidelines) tells applications and encoders what they can expect as a minimum capability. They can then use this information in the application design and/or stream layout. Implementations will still need to decide what to do if a key system is specified during the current GOP and an encrypted frame is encountered, but at least this should be rare.

Sounds reasonable to me.


On Wed, Oct 3, 2012 at 6:23 PM, Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:
Do we want to require that playback of an ISO BMFF Segment should begin with the unencrypted frames at the start, *before the keysystem is selected*, with the keysystem then being selected and the key exchange completing before the first encrypted frame is encountered - all within the same Segment (i.e. the Segment is marked encrypted, but the UA code outside the CDM knows Common Encryption and knows it can handle the unencrypted frames at the start ?

I'm not sure how we could require or enforce that. There will always be some chance that encrypted frames will be encountered before the key system is selected.

Sorry - I just meant that we should (or perhaps already do) require that playback of unencrypted frames will begin even if the keysystem is not yet selected.

…Mark


It's certainly possible, but it seems there is no harm in selecting the keysystem before playback begins in this case: the CDM can still play the unencrypted frames before the license arrives.

…Mark

On Oct 3, 2012, at 6:01 PM, Aaron Colwell wrote:

> Hi Mark,
>
> I'm sorry I wasn't clear. I was talking about unencrypted <->
> encrypted segment transitions that would cause the UA to have to
> change its internal decoder instance. When a presentation starts with
> an init segment that doesn't signal encryption the UA may choose to
> instantiate it's standard decoder instead of the CDM
> decrypt-and-decode implementation. If an encrypted init segment is
> signalled later the UA would have to switch to the CDM
> decrypt-and-decode implementation whether there was unencrypted frames
> at the beginning or not. I was just trying to point out that it would
> be safe to swap out the decoder implementations at the segment
> boundary because the encrypted segment always starts at a random
> access point. The UA wouldn't have to worry about migrating decode
> state between the two impelementations.
>
> I understand that, within a segment, ISO allows encryption to start at
> any point. I agree with you that it makes sense to only require that
> unencrypted frames have no dependencies on encrypted frames and not
> make the hard requirement that encryption changes can only happen on
> I-Frames. If a content author gets that wrong AND they have slow key
> servers, they deserve to receive nasty customer feedback about stalled
> playback. ;)
>
> Aaron
>
> On Wed, Oct 3, 2012 at 5:28 PM, Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:
>> Aaron,
>>
>> I don't think it's true that a switch between unencrypted and encrypted content can only occur at a Media Segment boundary with MSE.
>>
>> If you are using ISO Base Media File Format and the Init segment indicates that the video track is encrypted, it is still possible that individual samples are marked as unencrypted. Playback of those samples is possible without receiving the key and indeed players should play those unencrypted samples that are at the start. As noted below this it likely a good way of shortening stream start time.
>>
>> I think it's reasonable to require that unencrypted samples before the first encrypted sample (in either decode or presentation order) must not depend on the first encrypted sample or any subsequent samples (in either decode or presentation order). This is weaker than saying that the transition may only take place at an I-Frame.
>>
>> …Mark
>>
>> On Oct 3, 2012, at 4:53 PM, Aaron Colwell wrote:
>>
>>> These all seem like quality of implementation issues. I'm sure CE
>>> devices implement a subset of a variety of W3C specs. I believe MSE
>>> and EME shouldn't be exceptions. I would hate to see us restrict what
>>> is possible with MSE & EME just because CE hardware can't handle the
>>> full specs yet. I don't believe anything I have suggested would be
>>> impossible on a desktop machine and I'd like to avoid imposing
>>> restrictions on how people decide to mix and match content with MSE &
>>> EME.
>>>
>>> I know this may not match the current models that large content
>>> companies have to deal with right now, but I do believe this will
>>> change in the future and I want to make sure that these specs allow
>>> experimentation with a mixture of protected and clear content.
>>>
>>> Aaron
>>>
>>> On Wed, Oct 3, 2012 at 4:09 PM, Steven Robertson <strobe@google.com<mailto:strobe@google.com>> wrote:
>>>> CIL. The gist of it is that I like the conceptual purity of making sure
>>>> applications have as few requirements to worry about as possible with regard
>>>> to changing key systems, but I feel that the tradeoff in terms of ubiquity
>>>> and compatibility of implementations is too large to allow this.
>>>>
>>>> EME with this restriction (along with MSE) is attainable for CE systems
>>>> integrators in this and the next few generations because it's mostly a
>>>> browser level for current devices. Doing EME to spec without any of the
>>>> proposed restrictions would require end-to-end integration for device
>>>> makers, which for them is four or five suppliers deep.
>>>>
>>>> On Wed, Oct 3, 2012 at 1:09 PM, Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>> wrote:
>>>>>
>>>>> I don't think a switch in the middle of a GOP would be a problem. In
>>>>>
>>>>> the case of MSE at least, encryption changes requires an init segment
>>>>> and then a new media segment to get appended. Since media segments are
>>>>> required to start at a random access point you'll always have a
>>>>> keyframe so initializing a new decoder shouldn't be a problem.
>>>>
>>>>
>>>> I think this might be a little optimistic. On CE devices, a key system
>>>> switch can entail things like switching the media processor used (yep,
>>>> "processor": some implementations run trusted media decode on a separate
>>>> core from normal media decode, which may be on a separate package from a
>>>> different supplier), renegotiating display connection for HDCP and disabling
>>>> analog outputs, disabling audio mixing so that audio can be forwarded with
>>>> encryption, and tearing down the read-write graphics stack and replacing it
>>>> with one that treats all operations as write-only.
>>>>
>>>> Most devices have only one CDM, but those CDMs usually have a more limited
>>>> format support and impose restrictions on the platform when in use, so
>>>> running media through the key system all the time would have a high cost
>>>> too.
>>>>
>>>>>
>>>>>>> Some options:
>>>>>>>
>>>>>>> 1. Require the key system to be specified before loading and/or
>>>>>>> decoding
>>>>>>> starts. If it is not specified by this time, it cannot be set later,
>>>>>>> meaning
>>>>>>> decryption would not be possible. This would likely reduce the utility
>>>>>>> of
>>>>>>> the needkey event.
>>>>>
>>>>> I think this is too restrictive and I don't really support it.
>>>>
>>>>
>>>> This restriction, or 5. or 6., is likely a needed compromise to earn broad
>>>> adoption on TVs in this upcoming product cycle.
>>>>
>>>>>
>>>>>>> 5. Switch immediately and drop frames if necessary.
>>>>>
>>>>> I don't really understand what this is proposing, but it feels like a
>>>>> quality of implementation issue.
>>>>>
>>>>>>> 6. Suggest the above to applications and make it a quality of
>>>>>>> implementation
>>>>>>> issue for applications.
>>>>>
>>>>> I think switches between encrypted & non-encrypted content should be
>>>>> allowed. How well different scenarios work can be a quality of
>>>>> implementation issue. For example I think the two following situations
>>>>> should be allowed:
>>>>>
>>>>> Scenario 1: Early notification of encryption
>>>>> 1. append init segment that signals a key system
>>>>> 2. append init segment that signals unencrypted content
>>>>> 3. append media segments with unencrypted content
>>>>> 4. append init segment that signals a key system
>>>>> 5. append media segments with encrypted content
>>>>>
>>>>> Scenario 2: Just-in-time notification of encryption
>>>>> 1. append init segment that signals unencrypted content
>>>>> 2. append media segments with unencrypted content
>>>>> 3. append init segment that signals a key system
>>>>> 4. append media segments with encrypted content
>>>>>
>>>>> I could see some implementations supporting Scenario 1 slightly better
>>>>> than Scenario 2  because it allows the UA to start the "needkey" dance
>>>>> earlier and, depending on the UA's media pipeline implementation,
>>>>> could avoid a decoder reinitialization. I don't think this means that
>>>>> we should prevent Scenario 2 though.
>>>>
>>>>
>>>> YouTube will actually require support for clear-start with Media Source, in
>>>> order to cover up key exchange latency. However, we're guaranteeing that our
>>>> app will provide an indication of the selected key system before the
>>>> transition to NETWORK_LOADING.
>>>>
>>>>> If an application is generating some sort of dynamic playlist, it may
>>>>> not know whether encryption will eventually be part of the
>>>>> presentation at the start. It may be appending content far enough
>>>>> ahead of the current playback position though so there will be plenty
>>>>> of time for the "needkey" handshake to happen before the content is
>>>>> actually played. If this isn't done quick enough then playback should
>>>>> stall until the UA has the keys it needs to continue. This behavior
>>>>> should be incentive enough for the application to notify the UA of
>>>>> encryption initialization segments as early as possible.
>>>>
>>>>
>>>> Not sure how much of a use-case there is for third-party remixing of
>>>> protected content; seems to me that those concepts are mostly orthogonal due
>>>> to licensing restrictions. If a content provider has the licenses to enable
>>>> remixing of protected content, they can probably distribute that content
>>>> with support for the same key system.
>>>>
>>>>>>> 7. Leave the behavior undefined, making it a quality of implementation
>>>>>>> issue
>>>>>>> for user agents.
>>>>>
>>>>> I don't think this should be left completely undefined since that
>>>>> would likely cause interoperability problems. I think a reasonable
>>>>> compromise is to allow switches between unencrypted/encrypted content,
>>>>> but limit it to a single key system.
>>>>
>>>>
>>>> Agreed heartily that this shouldn't be left undefined.
>>>>
>>>> Thanks,
>>>> Steve
>>>>
>>>
>>>
>>
>
Received on Friday, 5 October 2012 00:35:26 UTC