Re: [EME] Switching decoders when the key system is specified from Mark Watson on 2012-10-04 (public-html-media@w3.org from October 2012)

From: Mark Watson <watsonm@netflix.com>
Date: Thu, 4 Oct 2012 00:28:48 +0000
To: Aaron Colwell <acolwell@google.com>
CC: Steven Robertson <strobe@google.com>, David Dorwin <ddorwin@google.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <8A69C603-CF2F-464C-9905-E31663127509@netflix.com>
Aaron,

I don't think it's true that a switch between unencrypted and encrypted content can only occur at a Media Segment boundary with MSE.

If you are using ISO Base Media File Format and the Init segment indicates that the video track is encrypted, it is still possible that individual samples are marked as unencrypted. Playback of those samples is possible without receiving the key and indeed players should play those unencrypted samples that are at the start. As noted below this it likely a good way of shortening stream start time.

I think it's reasonable to require that unencrypted samples before the first encrypted sample (in either decode or presentation order) must not depend on the first encrypted sample or any subsequent samples (in either decode or presentation order). This is weaker than saying that the transition may only take place at an I-Frame.

…Mark

On Oct 3, 2012, at 4:53 PM, Aaron Colwell wrote:

> These all seem like quality of implementation issues. I'm sure CE
> devices implement a subset of a variety of W3C specs. I believe MSE
> and EME shouldn't be exceptions. I would hate to see us restrict what
> is possible with MSE & EME just because CE hardware can't handle the
> full specs yet. I don't believe anything I have suggested would be
> impossible on a desktop machine and I'd like to avoid imposing
> restrictions on how people decide to mix and match content with MSE &
> EME.
> 
> I know this may not match the current models that large content
> companies have to deal with right now, but I do believe this will
> change in the future and I want to make sure that these specs allow
> experimentation with a mixture of protected and clear content.
> 
> Aaron
> 
> On Wed, Oct 3, 2012 at 4:09 PM, Steven Robertson <strobe@google.com> wrote:
>> CIL. The gist of it is that I like the conceptual purity of making sure
>> applications have as few requirements to worry about as possible with regard
>> to changing key systems, but I feel that the tradeoff in terms of ubiquity
>> and compatibility of implementations is too large to allow this.
>> 
>> EME with this restriction (along with MSE) is attainable for CE systems
>> integrators in this and the next few generations because it's mostly a
>> browser level for current devices. Doing EME to spec without any of the
>> proposed restrictions would require end-to-end integration for device
>> makers, which for them is four or five suppliers deep.
>> 
>> On Wed, Oct 3, 2012 at 1:09 PM, Aaron Colwell <acolwell@google.com> wrote:
>>> 
>>> I don't think a switch in the middle of a GOP would be a problem. In
>>> 
>>> the case of MSE at least, encryption changes requires an init segment
>>> and then a new media segment to get appended. Since media segments are
>>> required to start at a random access point you'll always have a
>>> keyframe so initializing a new decoder shouldn't be a problem.
>> 
>> 
>> I think this might be a little optimistic. On CE devices, a key system
>> switch can entail things like switching the media processor used (yep,
>> "processor": some implementations run trusted media decode on a separate
>> core from normal media decode, which may be on a separate package from a
>> different supplier), renegotiating display connection for HDCP and disabling
>> analog outputs, disabling audio mixing so that audio can be forwarded with
>> encryption, and tearing down the read-write graphics stack and replacing it
>> with one that treats all operations as write-only.
>> 
>> Most devices have only one CDM, but those CDMs usually have a more limited
>> format support and impose restrictions on the platform when in use, so
>> running media through the key system all the time would have a high cost
>> too.
>> 
>>> 
>>>>> Some options:
>>>>> 
>>>>> 1. Require the key system to be specified before loading and/or
>>>>> decoding
>>>>> starts. If it is not specified by this time, it cannot be set later,
>>>>> meaning
>>>>> decryption would not be possible. This would likely reduce the utility
>>>>> of
>>>>> the needkey event.
>>> 
>>> I think this is too restrictive and I don't really support it.
>> 
>> 
>> This restriction, or 5. or 6., is likely a needed compromise to earn broad
>> adoption on TVs in this upcoming product cycle.
>> 
>>> 
>>>>> 5. Switch immediately and drop frames if necessary.
>>> 
>>> I don't really understand what this is proposing, but it feels like a
>>> quality of implementation issue.
>>> 
>>>>> 6. Suggest the above to applications and make it a quality of
>>>>> implementation
>>>>> issue for applications.
>>> 
>>> I think switches between encrypted & non-encrypted content should be
>>> allowed. How well different scenarios work can be a quality of
>>> implementation issue. For example I think the two following situations
>>> should be allowed:
>>> 
>>> Scenario 1: Early notification of encryption
>>> 1. append init segment that signals a key system
>>> 2. append init segment that signals unencrypted content
>>> 3. append media segments with unencrypted content
>>> 4. append init segment that signals a key system
>>> 5. append media segments with encrypted content
>>> 
>>> Scenario 2: Just-in-time notification of encryption
>>> 1. append init segment that signals unencrypted content
>>> 2. append media segments with unencrypted content
>>> 3. append init segment that signals a key system
>>> 4. append media segments with encrypted content
>>> 
>>> I could see some implementations supporting Scenario 1 slightly better
>>> than Scenario 2  because it allows the UA to start the "needkey" dance
>>> earlier and, depending on the UA's media pipeline implementation,
>>> could avoid a decoder reinitialization. I don't think this means that
>>> we should prevent Scenario 2 though.
>> 
>> 
>> YouTube will actually require support for clear-start with Media Source, in
>> order to cover up key exchange latency. However, we're guaranteeing that our
>> app will provide an indication of the selected key system before the
>> transition to NETWORK_LOADING.
>> 
>>> If an application is generating some sort of dynamic playlist, it may
>>> not know whether encryption will eventually be part of the
>>> presentation at the start. It may be appending content far enough
>>> ahead of the current playback position though so there will be plenty
>>> of time for the "needkey" handshake to happen before the content is
>>> actually played. If this isn't done quick enough then playback should
>>> stall until the UA has the keys it needs to continue. This behavior
>>> should be incentive enough for the application to notify the UA of
>>> encryption initialization segments as early as possible.
>> 
>> 
>> Not sure how much of a use-case there is for third-party remixing of
>> protected content; seems to me that those concepts are mostly orthogonal due
>> to licensing restrictions. If a content provider has the licenses to enable
>> remixing of protected content, they can probably distribute that content
>> with support for the same key system.
>> 
>>>>> 7. Leave the behavior undefined, making it a quality of implementation
>>>>> issue
>>>>> for user agents.
>>> 
>>> I don't think this should be left completely undefined since that
>>> would likely cause interoperability problems. I think a reasonable
>>> compromise is to allow switches between unencrypted/encrypted content,
>>> but limit it to a single key system.
>> 
>> 
>> Agreed heartily that this shouldn't be left undefined.
>> 
>> Thanks,
>> Steve
>> 
> 
>
Received on Thursday, 4 October 2012 00:29:18 UTC