Re: [EME] reuse of session from David Dorwin on 2014-06-17 (public-html-media@w3.org from June 2014)

From: David Dorwin <ddorwin@google.com>
Date: Mon, 16 Jun 2014 17:57:24 -0700
To: Mark Watson <watsonm@netflix.com>
Cc: "Maruyama, Shinya" <Shinya.Maruyama@jp.sony.com>, "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <CAHD2rsjJjQTxFLV0cYnARmmFJD2+iacw-rRiUo8+T_1a4=FdLA@mail.gmail.com>
On Mon, Jun 16, 2014 at 5:23 PM, Mark Watson <watsonm@netflix.com> wrote:

>
>
>
> On Mon, Jun 16, 2014 at 5:17 PM, Maruyama, Shinya <
> Shinya.Maruyama@jp.sony.com> wrote:
>
>>    *From:* Mark Watson [mailto:watsonm@netflix.com]
>> *Sent:* Tuesday, June 17, 2014 8:59 AM
>>
>> *To:* Maruyama, Shinya
>> *Cc:* David Dorwin; public-html-media@w3.org
>> *Subject:* Re: [EME] reuse of session
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 16, 2014 at 4:46 PM, Maruyama, Shinya <
>> Shinya.Maruyama@jp.sony.com> wrote:
>>
>> I think you are right if EME specifies this best practice normatively;
>> i.e. specifies the requirements for manifests, media segments and DRM so
>> that the application conformant with the requirements can safely ignore the
>> needkey (“ignore needkey” looks ad-hoc solution though).
>>
>>
>>
>> I think, however, that is overkill spec. If we just specifies it
>> informatively, generic application still should work just as specified by
>> EME, that is, the application should call createSession whenever receiving
>> the needkey event. This is nature of current model. (Although practically
>> an application may be aware of something to ignore the events safely, it
>> just relies on developer’s optimization).
>>
>
That is one possible design, but not the only one. Nothing in the spec says
you *must* call createSession(). The figure
<https://dvcs.w3.org/hg/html-media/raw-file/default/encrypted-media/encrypted-media.html#introduction>
shows that needkey is optional and the first example
<https://dvcs.w3.org/hg/html-media/raw-file/default/encrypted-media/encrypted-media.html#example-source-and-key-known>
does not use it.

>
>>
>> I don't understand why we would need to specify that. The application is
>> provided by a service provider who is also the one providing the manifest.
>> They know when they write the application that it will not rely on needkey.
>>
>
+1

>
>>
>> Personally, I’m fine with it. My thought is just focusing on how to
>> specify it. At least, the best practice should not be ambiguous so that the
>> people who are not engaged in this discussion can also be aware of it.
>>
>> I also do not stick to adding KID-granular comparison. I’m happy with the
>> solution whatever specified clearly (if there is a reasonable reason to
>> choose it).
>>
>>
>>
>>
>>
>> Perhaps someone writing a general-purpose library that will be used in a
>> number of different contexts needs to think about what modes they will
>> support or how they will discover what model a given service provider uses.
>> That's a question for the library developer to decide on the modes they
>> support and give the library user the choice. I wouldn't have though
>> indications from EME would be reliable enough to drive that, unless we have
>> an enumeration now of the various usage models.
>>
>>
>>
>> I thought one of our goals is to have an interoperable application to
>> cover wide range of media presentations and its usage.
>>
>> (I do not think it’s easy thoug…)
>>
>
> No, I don't recall that being a goal. The web platform provides site
> developers with tools that they can use to develop sites / applications.
> Sometimes those are low-level tools and so people create higher-level
> libraries, but there is no more a need to have a single "universal media
> player" application than there is a need to have a "universal web site"
> application.
>

Correct. The goal is to have an interoperable application that works across
user agents, not necessarily for any media stream. Applications may work
better if they have some knowledge of the file packaging or have influenced
the packaging.

>
>
>>
>>
>>
>>
>> This does seem to me an example where we need to focus back on concrete
>> usage models. Where we have a detailed model in front of us, we can decide
>> whether that model is to be supported in this first version and if so
>> exactly how, but without more requirements-level specification of the model
>> it's very hard to tell what support is needed.
>>
>
+1

>
>>
>> Agreed. Better to discuss after all the models are lined up in front of
>> us.
>>
>
Note that this is already somewhat covered by
https://www.w3.org/wiki/HTML/Media_Task_Force/EME_Use_Cases#1._Simple_Streaming_of_a_Specific_Title:
"The license request is generated based on Initialization Data that the
application obtains from one of the following:"

>
>>
>> By the way, I have an another question. Is ‘ignore needkey’ necessary for
>> WebM?
>>
>> If “no”, it might be better to seek consistent and container-independent
>> behavior.
>>
>
The same issues should apply. You could create a session for multiple keys
from a manifest using the "keyids" Initialization Data Type. Then, each
needkey event would *not* match the original initData. (The types don't
even need to match!) An application could choose to parse the various
values, just as it could for CENC, but it's not required.

Note that ensuring different types, including WebM and CENC, can work is an
important part of the spec. The differences in these formats provide a good
check to make sure the spec does not over-rely on one particular format.

>
>>
>> Thanks,
>>
>> Shinya
>>
>>
>>
>>
>>
>> ...Mark
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Mark Watson [mailto:watsonm@netflix.com]
>> *Sent:* Tuesday, June 17, 2014 8:08 AM
>> *To:* Maruyama, Shinya
>>
>>
>> *Cc:* David Dorwin; public-html-media@w3.org
>> *Subject:* Re: [EME] reuse of session
>>
>>
>>
>> In this case, isn't the application aware that it is getting the
>> necessary initData from the MPD and so can safely ignore the needkey events
>> ?
>>
>>
>>
>> ...Mark
>>
>>
>>
>> On Mon, Jun 16, 2014 at 4:02 PM, Maruyama, Shinya <
>> Shinya.Maruyama@jp.sony.com> wrote:
>>
>> I should have added steps of needkey events.
>>
>> Those steps are triggered because application receives the needkey.
>>
>> For example, 1 to 8 is the case of VOD content playback and 1 to 16 is
>> the case of live streaming requiring key rotation.
>>
>>
>>
>> Currently steps 5, 8, 12, 13 and 16 causes creating extra sessions and
>> then result in acquiring duplicated license because raw intiData comparison
>> cannot detect the subset of KIDs being supplied.
>>
>>
>>
>> 1) Fetch MPD1
>>
>> 2) createSession(KIDv1, KIDa1 in MPD1) -> Resolved with session1 and
>> KIDv1 and KIDa1 is stored in UA  // this session is created proactively
>> without receiving needkey
>>
>> 3) Fetch video1(KIDv1) segment
>>
>> 4) needkey(KIDv1 in moof) is fired // In the case of MSE, typically KIDv1
>> is delivered by initialization segment containing pssh in moov
>>
>> 5) createSession(KIDv1) -> Resolved with null // because KIDv1 is already
>> included in active session list
>>
>> 6) Fetch audio1 segment
>>
>> 7) needkey(KIDa1 in moof) is fired // In the case of MSE, typically KIDa1
>> is delivered by initialization segment containing pssh in moov
>>
>> 8) createSession(KIDa1 in moof) -> Resolved with null
>>
>> -------------- if key rotation happens --------------
>>
>> 9) Fetch MPD2
>>
>> 10) createSession(KIDv2, KIDa2 in MPD2) -> Resolved with session2 // this
>> session is created proactively without receiving needkey
>>
>> 11) Fetch video2 segment
>>
>> 12) needkey(KIDv2 in moof) is fired // In the case of MSE, typically
>> KIDv2 is delivered by initialization segment containing pssh in moov
>>
>> 13) createSession(KIDv2 in moof) -> Resolved with null
>>
>> 14) Fetch audio2 segment
>>
>> 15) needkey(KIDa2 in moof) is fired // In the case of MSE, typically
>> KIDa2 is delivered by initialization segment containing pssh in moov
>>
>> 16) createSession(KIDa2 in moof) -> Resolved with null
>>
>>
>>
>>
>>
>> *From:* Mark Watson [mailto:watsonm@netflix.com]
>> *Sent:* Monday, June 16, 2014 11:57 PM
>> *To:* Maruyama, Shinya
>> *Cc:* David Dorwin; public-html-media@w3.org
>>
>>
>> *Subject:* Re: [EME] reuse of session
>>
>>
>>
>> I'm afraid I have not been following this whole thread, but why do you
>> have steps 4, 6, 10 and 12 at all below ?
>>
>>
>>
>> ...Mark
>>
>> Sent from my iPhone
>>
>>
>> On Jun 15, 2014, at 7:33 PM, "Maruyama, Shinya" <
>> Shinya.Maruyama@jp.sony.com> wrote:
>>
>>  Please see replies inline.
>>
>>
>>
>> *From:* David Dorwin [mailto:ddorwin@google.com <ddorwin@google.com>]
>> *Sent:* Saturday, June 14, 2014 3:31 AM
>> *To:* Maruyama, Shinya
>> *Cc:* public-html-media@w3.org
>> *Subject:* Re: [EME] reuse of session
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jun 12, 2014 at 8:34 PM, Maruyama, Shinya <
>> Shinya.Maruyama@jp.sony.com> wrote:
>>
>>     Does the KID array really help? The user agent would still need to
>> (asynchronously) ask the CDM if it has each key ID. Even if it did, this
>> only addresses a subset of content in one particular format.
>>
>>
>>
>>
>>
>> I’m not sure why you think the user agent needs to ask the CDM.
>>
>> Current model does not take care of “active session Initialization Data“ delivering
>> what set of KIDs to CDM () or does not  ensure the preceding session have
>> completed successfully. It just relies on the same initData will result in
>> the same license and don’t care the result of pending license
>> (createSession is resolved with null even though the preceding session may
>> fails to acquire the license).
>>
>> KID-granular comparison is basically the same. The initData should result
>> in the license delivering all the KIDs contained in the initData (maybe it
>> delivers extra KIDs though). It just ensures that the KIDs listed in UA
>> will be or have been made available to CDM. This is the same assumption
>> which the current model relies on. If a preceding license delivers extra
>> KIDs, unnecessary session may be created. However it is not worse than
>> current model, either.
>>
>>  I don't see how it is *better* and thus why we should add special
>> behavior for one format (CENC) or a dependency on CENC second edition.
>> (Actually, WebM would work fine because the initData *is* a KID.)
>>
>>
>>
>> If, for example, audio and video streams are encrypted with different
>> keys, they will have different PSSH boxes with different KID values. *If* the
>> first session results in a license for *both* keys, the application and
>> user agent will not know this. Only the CDM knows that it already has keys
>> for both KIDs. Thus, the user agent can't do anything with knowledge of the
>> KIDs in the initData. It would still create multiple sessions because the
>> KIDs are different just as the entire initData is different.
>>
>>
>>
>> The case above is a bad practice we cannot de-dup licenses.
>>
>>
>>
>> What specifically is a bad practice? That all seems pretty standard.
>>
>>
>>
>> I just compared it to the best practice below.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> The one case where this might not be true is if there was fake initData
>> (i.e. from a manifest) that contained both KIDs. Maybe this is what you are
>> referring to below. However, this can probably be addressed by using a real
>> PSSH box from one of the streams. If the license server is capable of
>> returning a license for all KIDs based on a PSSH box containing just one of
>> them, there is no reason to include all the KIDs in the manifest (if you
>> are concerned about duplicate sessions in the key rotation case).
>>
>>
>>
>> The best practice I mentioned below is the case KID comparison in user
>> agents gives much help.
>>
>> Actually, DASH-IF and common encryption 2nd edition are addressing the
>> delivery of all the KIDs in the manifest.
>>
>>
>>
>>
>>
>> The biggest advantage on introducing KID-granular comparison is that it
>> helps to realize a best practice. For example, if manifest file delivers
>> the pssh containing all KIDs for the presentation, the application can
>> first call createSession with the pssh. Then, the subsequent media segment
>> does not cause unnecessary sessions even though pssh is contained in moov
>> or moof box.
>>
>> Raw initData comparison cannot make it because pssh is different among
>> manifest, moov and moof.
>>
>>  That's interesting - why are they different?
>>
>>
>>
>> Is it because, as you said above, that the PSSH box from the manifest?
>> What does the PSSH box in the moov contain? Why does the moov need a PSSH
>> box?
>>
>>
>>
>> Yes, the first pssh extracted from the manifest contains all the KID for
>> the presentation because the manifest is a something to cover the entire
>> streams.
>>
>> Subsequent pssh may come from either moov or moof to support random
>> access or trick play. Typically, as media segment contains a single track,
>> those pssh’s are not the same unless the particular constraint is
>> specified/operated like HbbTV restricting the Initialization Segment to
>> being the common among all representations.
>>
>>
>>
>> Are you still talking about a key rotation scenario? When you say "the
>> manifest contains all the KID for the presentation", are you referring to
>> all rotation periods or just for all streams in the current period? In the
>> former case, you can ignore the needkey events. In the latter, you are
>> still going to have problems for subsequent periods (see below).
>>
>>
>>
>> It’s not limited to key rotation scenario. This sort of best practice
>> would be generally useful for single track based media segment with
>> different key encryption.
>>
>> As to "the manifest contains all the KID for the presentation", I was
>> referring to the case where a manifest contains KIDs for audio, video
>> streams (irrespective of VoD or live streaming).
>>
>>
>>
>>
>>
>> The PSSH box in the moof only contains key ID(s) for that specific track
>> (e.g. video), right? If so, you'll have the same problem of different KIDs
>> in the needkey events in a future rotation period.
>>
>>
>>
>> In the case of DASH live streaming, MPD update mechanism can be used to
>> create a new session with using updated MPD before key rotation happens.
>>
>>
>>
>>
>>
>> I must be missing something. In addition to answering the questions
>> above, it might help to provide an explicit example - what KID(s) are in
>> the manifest, each PSSH box, each license, etc.
>>
>>
>>
>> 1)      Fetch MPD1
>>
>> 2)      createSession(KIDv1, KIDa1 in MPD1) -> Resolved with session1
>>
>> 3)      Fetch video1 segment
>>
>> 4)      createSession(KIDv1 in moof) -> Resolved with null
>>
>> 5)      Fetch audio1 segment
>>
>> 6)      createSession(KIDa1 in moof) -> Resolved with null
>>
>> 7)      Fetch MPD2
>>
>> 8)      createSession(KIDv2, KIDa2 in MPD2) -> Resolved with session2
>>
>> 9)      Fetch vieo2 segment
>>
>> 10)  createSession(KIDv2 in moof) -> Resolved with null
>>
>> 11)  Fetch audio2 segment
>>
>> 12)  createSession(KIDa2 in moof) -> Resolved with null
>>
>> …
>>
>>
>>
>>
>>
>>
>>
>
>
Received on Tuesday, 17 June 2014 00:58:13 UTC