[EME] Separating key session creation from the media element (using a MediaKeySession constructor)

In today's teleconference [1], we decided to create a session object [2]
with video.generateKeyRequest() rather than video.createKeySession() or
something similar. While working on the change proposal, I realized that
there are good reasons to consider separating object creation from the
media element (something we had not discussed). I think the potential
benefits are compelling enough to revisit this issue. Hopefully we can have
a quick discussion in email and move forward with one of these solutions.
If you are interested in the API design, please provide feedback and/or
state your preference soon. In parallel, I'll continue working on the
change proposal (ETA is now tomorrow).

The new option is to create session objects using "new
MediaKeySession(keySystem,
mimeType)". For example:
  function handleNeedKey(event) {
    var session = new MediaKeySession(keySystem, mimeType);
    if (session) {
      session.onkeymessage = handleKeyMessage;
      session.onkeyerror = handleKeyError;
      session.generateKeyRequest(initData);
      var video = event.target;
      video.addKeySession(session);
    }
  }

Another variant would be to add initData to the MediaKeySession constructor
and specify that the constructor generates a key request. This would
eliminate the need for generateKeyRequest() and have most of the benefits
of the video.generateKeyRequest() solution discussed in the teleconference.
For example:
  function handleNeedKey(event) {
    var session = new MediaKeySession(keySystem, mimeType, initData);
    if (session) {
      session.onkeymessage = handleKeyMessage;
      session.onkeyerror = handleKeyError;
      var video = event.target;
      video.addKeySession(session);
    }
  }

*Advantages of this approach*

   1. This might make more sense if we eventually decide to support sharing
   sessions between media elements (
   https://www.w3.org/Bugs/Public/show_bug.cgi?id=16615 and
   https://www.w3.org/Bugs/Public/show_bug.cgi?id=17202).
      1. Any of the "session = video.foo()" solutions result in an implicit
      relationship.
   2. It might enable initiating key exchange *before* the media element
   starts loading. [3]
   3. The resulting object could be used for Key Release (
   https://www.w3.org/Bugs/Public/show_bug.cgi?id=17199) without needing to
   create a "dummy" media element (
   http://lists.w3.org/Archives/Public/public-html-media/2012Jun/0107.html)
   or defining a separate object just for key release (
   http://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-media.html#key-release-manager
   ).
   4. MediaSource is using this model ("new MediaSource";
   http://lists.w3.org/Archives/Public/public-html-media/2012Jun/0071.html),
   and it would be nice to have consistency.
   5. In the first variant only, explicit separation of creation from
   actions.


*Effects of this approach*
The primary effect is that the session object is not *implicitly* associated
with a media element.

Because initData should contain all the information (i.e. the appropriate
ISO CENC PSSH) necessary to obtain a license and verify that the specified
key system is supported, the session object should not need to be
associated with a specific element (or source file/stream) until its keys
are needed to decrypt content.

However, the separation does require the following:

   - The MIME type must be explicitly specified to the session object.
      - This is similar to the MediaSource constructor, which takes a type
      string.
      - For the case where the object is created in response to a needkey
      event, we could even provide the current MIME type as an attribute of the
      event.
      - Since the MIME type is specified separately, the media element
      would need to fail addKeySession() or only use a session if the types
      were compatible.
      - (This would be required anyway if we wanted to address advantage #2
      above. [3])
   - Session objects must be implicitly or explicitly associated with media
   element(s).
      - This is similar to MediaSource, which provides a URL to video.src.
      - Implicit example: All media elements may use all sessions.
      - Explicit example: video.addKeySession(session);


Compared to the video.generateKeyRequest() solution:

   - The second variant (constructor generates a key request) is very
   similar.
   - The first variant (constructor does NOT generate a key request):
      - Requires one or two more lines of application code during
      initialization.
      - Requires an object(s) must be created first for each KeySystem in
      the use case where generateKeyRequest(keySystem, initData) can be called
      repeatedly until a supported combination is found.
         - This seems accpetable. [4]
      - No longer implicitly enforces that generateKeyRequest() is called
      before addKey().
         - We could just choose not to enforce this but instead require
         that all key systems support generateKeyRequest() by
returning a keymessage
         so that applications can always follow this pattern. That was
the main goal
         of this requirement anyway.
      - No longer implicitly enforces that a session object represents
      exactly one initData value.
         - Instead, subsequent generateKeyRequest() calls should explicitly
         fail, at least if a previous call was successful.



[1] http://www.w3.org/2012/06/26-html-media-minutes.html

[2] See
http://lists.w3.org/Archives/Public/public-html-media/2012Jun/0054.html and
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16613

[3] Currently, the media element must have started loading before
generateKeyRequest() can be called. WebKit, at least, does not create the
underlying media engine until loading starts. Even if we could work around
this, the media element would need to somehow know the MIME type it
*will*be provided so that the user agent or the CDM can parse the
initData.  If
we separate the CDM from the media element, implementations should be able
to create the underlying objects immediately regardless of the media
implementation.

[4] If the combinations passed to generateKeyRequest() fail

   - Synchronously:
      - Creating an object to call this function doesn't seem too bad for
      the application. It should just be an extra line of code
      - Note: Failing synchronously in any design may require the user
      agent to be able to parse at least part of the initData and determine
      whether it contains data for one of the key systems it supports.
      - Assuming that creation of the session object causes the related CDM
      to spin up, CDMs may be spun up unnecessarily in the case that
      the generateKeyRequest() call fails. Fortunately, this would happen
      asynchronous to the application and should not affect performance.
   - Asynchronously:
      - Creating an object is likely to be a very minor part of the logic
      required to support this.

Received on Wednesday, 27 June 2012 01:30:32 UTC