Re: [EME] Switching decoders when the key system is specified from Aaron Colwell on 2012-10-03 (public-html-media@w3.org from October 2012)

From: Aaron Colwell <acolwell@google.com>
Date: Wed, 3 Oct 2012 13:09:56 -0700
To: David Dorwin <ddorwin@google.com>
Cc: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CAA0c1bAGyM_85KxhPY5oD2W68jgOCJ9N0b-+wo0TDXh591ZyRA@mail.gmail.com>
Comments inline...

On Tue, Oct 2, 2012 at 8:58 AM, David Dorwin <ddorwin@google.com> wrote:
> As discussed during the teleconference, there are some implicit assumptions
> below.
>
> The first is that once you select a key system (and thus decoder), you
> cannot change it. In other words, you cannot use multiple key systems during
> the life (between loads) of a media element.
>

I'm fine with only allowing a single key system to be used in the
presentation for now. At some point it would be nice to be able to
mashup content from multiple sources that don't necessarily use the
same key system, but I've fine with deferring that to a v2 feature.

> Also, to be clear, unencrypted content is supported even after selecting a
> key system. This means there is a requirement that CDM decoders can also
> decode clear/unencryped content.
>

Yes. I believe this is a "must have" since ads inserted into a
presentation with MSE will likely not be encrypted.

> On Sun, Sep 30, 2012 at 10:00 PM, David Dorwin <ddorwin@google.com> wrote:
>>
>> I filed the following as
>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=19156. I'd like to start a
>> discussion on the issue, possibly continuing during the teleconference
>> Tuesday.
>>
>>
>> Both v0.1 and the current version of the OO API allow the key system to be
>> specified at any point after loading has started. They even require that the
>> key system not be chosen or associated, respectively, until after loading
>> has started. As a result, it is very likely that the key system will be set
>> AFTER some frames have been decoded. For key systems/CDMs that include a
>> decoder (or entire video pipeline), this means that the decoder may change
>> when the key system is selected. This is primarily (only?) a problem if
>> there is some clear content being decoded/played before the encrypted
>> content so decoding could start before a key system was specified.
>>
>> Supporting decoder changes may be very difficult to implement, especially
>> without restrictions on the content. For example, the switch might occur in
>> the middle of a group of frames that depend on previous frames to decode
>> subsequent frames. While this could be worked around, it might be better to
>> avoid this situation completely.
>>

I don't think a switch in the middle of a GOP would be a problem. In
the case of MSE at least, encryption changes requires an init segment
and then a new media segment to get appended. Since media segments are
required to start at a random access point you'll always have a
keyframe so initializing a new decoder shouldn't be a problem.

>>
>> Some options:
>>
>> 1. Require the key system to be specified before loading and/or decoding
>> starts. If it is not specified by this time, it cannot be set later, meaning
>> decryption would not be possible. This would likely reduce the utility of
>> the needkey event.

I think this is too restrictive and I don't really support it.

>> 2. Allow switching decoders whenever Media Source Extensions allow changing
>> codecs/decoders.

MSE doesn't allow the codec to change within a single source buffer so
this would effectively mean seamless transitions would not be allowed
between encrypted and non-encrypted content. This is basically
equivalent to 1. at this point so I'm not a fan.

>> 3. Switch decoders when the first encrypted frame is encountered, possibly
>> requiring the next item.

I think when to switch decoders is a quality of implementation issue.
This should be transparent to the application. I think each UA will
have to decide when/if it needs to switch the underlying decoder.

>> 4. Establish limitations on which frames can be encrypted. For example, P-and
>> B-frames may not be encrypted if the I-frame is not. Due to seeking, this
>> would apply throughout a stream and not just to the beginning.

I don't think this is a real problem for MSE at least. Encrypted to
non-encrypted transitions and vice-versa can only happen at media
segment boundries which are random access points.

>> 5. Switch immediately and drop frames if necessary.

I don't really understand what this is proposing, but it feels like a
quality of implementation issue.

>> 6. Suggest the above to applications and make it a quality of implementation
>> issue for applications.

I think switches between encrypted & non-encrypted content should be
allowed. How well different scenarios work can be a quality of
implementation issue. For example I think the two following situations
should be allowed:

Scenario 1: Early notification of encryption
1. append init segment that signals a key system
2. append init segment that signals unencrypted content
3. append media segments with unencrypted content
4. append init segment that signals a key system
5. append media segments with encrypted content

Scenario 2: Just-in-time notification of encryption
1. append init segment that signals unencrypted content
2. append media segments with unencrypted content
3. append init segment that signals a key system
4. append media segments with encrypted content

I could see some implementations supporting Scenario 1 slightly better
than Scenario 2  because it allows the UA to start the "needkey" dance
earlier and, depending on the UA's media pipeline implementation,
could avoid a decoder reinitialization. I don't think this means that
we should prevent Scenario 2 though.

If an application is generating some sort of dynamic playlist, it may
not know whether encryption will eventually be part of the
presentation at the start. It may be appending content far enough
ahead of the current playback position though so there will be plenty
of time for the "needkey" handshake to happen before the content is
actually played. If this isn't done quick enough then playback should
stall until the UA has the keys it needs to continue. This behavior
should be incentive enough for the application to notify the UA of
encryption initialization segments as early as possible.

>> 7. Leave the behavior undefined, making it a quality of implementation issue
>> for user agents.

I don't think this should be left completely undefined since that
would likely cause interoperability problems. I think a reasonable
compromise is to allow switches between unencrypted/encrypted content,
but limit it to a single key system.


Aaron
Received on Wednesday, 3 October 2012 20:10:25 UTC