RE: Additional EME tests from Paul Cotton on 2016-06-24 (public-html-media@w3.org from June 2016)

From: Paul Cotton <Paul.Cotton@microsoft.com>
Date: Fri, 24 Jun 2016 14:39:15 +0000
To: David Dorwin <ddorwin@google.com>, Greg Rutz <G.Rutz@cablelabs.com>, "Mark Watson" <watsonm@netflix.com>
CC: "'public-html-media@w3.org'" <public-html-media@w3.org>, Francois Daoust <fd@w3.org>, Philippe Le Hégaret <plh@w3.org>, "John Rummell" <jrummell@google.com>, Ralph Brown <r.brown@cablelabs.com>, "John Simmons" <johnsim@microsoft.com>, Iraj Sodagar <irajs@microsoft.com>
Message-ID: <CY1PR03MB142309A501C048B52D6E3253EA2E0@CY1PR03MB1423.namprd03.prod.outlook.com>
I am moving this EME Editor-only discussion to public-html-media@w3.org<mailto:public-html-media@w3.org> so that all HME WG members are aware of the discussions about an EME test suite.  Please continue the discussion on this thread.

FTR the original discussion can be found starting at:
https://lists.w3.org/Archives/Public/public-hme-editors/2016Jun/0100.html


/paulc
HME WG Chair


From: David Dorwin [mailto:ddorwin@google.com]
Sent: Thursday, June 23, 2016 1:46 PM
To: Greg Rutz <G.Rutz@cablelabs.com>
Cc: Mark Watson <watsonm@netflix.com>; public-hme-editors@w3.org; Francois Daoust <fd@w3.org>; Philippe Le Hégaret <plh@w3.org>; John Rummell <jrummell@google.com>; Ralph Brown <r.brown@cablelabs.com>
Subject: Re: Additional EME tests

We have tools to generate WebM files, which are DRM-independent. There are two included in the Google directory: test-encrypted.webm and test-encrypted-different-av-keys.webm. The key IDs and keys are in encrypted-media-playback-two-videos.html and encrypted-media-playback-multiple-sessions.html, respectively. For simplicity, I suggest using these for the encryption of CENC files. (We also have some CENC files, encrypted with these keys but I believe they only have initData for the common format and Widevine.) We should move the key IDs and keys to a common location, though. (The initData values in encrypted-media-utils.js appear to contain different dummy values.)

Regarding Mark's comments about combinations, I don't think there is much allowance/expectation for variance in what combinations are supported, mainly because this would break the usability of the API. For example, if two containers are supported but only one supported a session type, a configuration containing the combination would be rejected. initData types within media are also constrained, but implementations are required to support generateRequest() of all supported initDataTypes regardless of the actual media.

I agree that we need to detect what is supported, at least for the simple spec test case, using some utility function. The Blink tests uses getSupportedInitDataType(), etc., though there is probably room for improvement. See also my comments inline Francois's reply, which I've copied here to unfork the thread.

On Thu, Jun 23, 2016 at 1:48 AM, Francois Daoust <fd@w3.org<mailto:fd@w3.org>> wrote:
Hi David,

I've been wondering about the same things for the MSE test suite. Some comments inline.


Le 23/06/2016 09:01, David Dorwin a écrit :
For Blink, we tried to follow our understanding of the WPT style, which
was that each test case be a separate file. In some cases, especially
the syntax tests, there are multiple categories of tests together. I
think readability is also important, even if it means duplication. (Of
course, API changes or refactorings can be monotonous when they have to
be applied to many files, but that should be rarer now.) As to which
approach we take for new tests, I defer to the WPT experts.

I don't qualify as WPT expert, but my understanding is that it is somewhat up to the people who write and review the tests. In the MSE test suite, a given test file often checks a particular algorithm and contains multiple test cases to check the different steps. I personally find that approach useful and readable as well.

I think we probably do want individual tests for various media types,
etc. For example, downstream users (i.e. user agent vendors) should be
able to say "I know I don't support foo, so all the "-foo.html" tests
are expected to fail. For tests that aren't specifically about a type
(or key system), the tests should select a supported one and execute the
tests.

I quickly glanced at the HTML test suite for media elements to see how tests were written there:
https://github.com/w3c/web-platform-tests/tree/master/html/semantics/embedded-content/media-elements


Most test files seem to pick up a supported MIME type, using common functions defined in:
https://github.com/w3c/web-platform-tests/blob/master/common/media.js


There are exceptions to the rule, such as tests on the "canPlayType" method that contain test cases explicitly marked as "(optional)":
http://w3c-test.org/html/semantics/embedded-content/media-elements/mime-types/canPlayType.html


For MSE, most tests can be written without having to impose a particular MIME type (with a few exceptions as well, e.g. to test the "generate timestamps flag"), and it seems a good idea to keep the number of MIME-type specific tests minimal to improve the readability of the implementation report. Whenever possible, we need the MIME-agnostic version of the tests to assess the "at least two PASS" condition in the report.

Ideally, it would be possible to force such tests to run all supported
variants. For example, Chrome might want to run the tests with both MP4
and WebM. encrypted-media-syntax.html, for example, tries both WebM
and/or CENC types based on whether they are supported, requires all
supported to pass, and ensures that at least one was run. This has the
advantage of testing both paths when supported, though it's not
verifiable anywhere that both ran. I don't know whether it would be
useful to be able to say run all the tests with WebM then repeat with CENC.

I've been wondering about that as well for MSE tests. Passing a test for a given MIME type does not necessarily imply that the test also passes if another supported MIME type gets used. It would make tests harder to write though (more error-prone, harder to debug, and slightly harder for user agent vendors to tell what failed in practice). It's often easier to create one test case per variant.

After I sent this, I realized that example (encrypted-media-syntax.html) won't scale to larger tests, such as playback. It might be that this was the easiest way for us to add coverage without deciding on some larger infrastructure for running multiple variants.

In the end, what could perhaps work is to create a "createGenericAndVariantTests" method which takes a list of variants as input, replaces the usual calls to "test" or "async_test", and generates a generic test case that picks up the first supported variant together with a set of variant test cases marked as optional that test the same thing for each and every variant.

I agree that some way to run the tests with variants is probably the ideal mechanism for general tests. (I'd still like to have -keyids.html, etc. tests where we are specifically testing those capabilities.) I also agree that the tests should be easy to write and maintain. Another option would be to make it possible for the "runner" to override the types that are automatically selected by the utility function that picks that type to test (e.g. getSupportedInitDataType()).

The generic test case would give the result needed for the implementation report. The additional optional test cases could help user agent vendors detect additional issues with a particular variant and such tests should be easy to filter out from the implementation report as needed if they are consistently flagged with "(optional)".

Francois.

On Thu, Jun 23, 2016 at 10:08 AM, Greg Rutz <G.Rutz@cablelabs.com<mailto:G.Rutz@cablelabs.com>> wrote:
I have a toolchain that can generate MP4 CENC content with multiple DRMs using the CastLabs DRMToday<http://drmtoday.com/> service.  With these tools I can select my own key/keyID, encrypt the content and ingest the key into the DRMToday license server (as many of you know, CastLabs has graciously agreed to provide an account to facilitate the W3C EME testing platform).  In addition, I have a very simple proxy server (required by the DRMToday architecture to sign license requests on behalf of the account owner) which can assign “rights" to each key to provide a customized license.  The system is quite flexible and we would be able to customize the rights for each test with only a single piece of content.  This may be valuable if we need to test key expiration or other rights-related operations that would be exposed to applications through the EME APIs.

Please note that my toolchain has the following limitations and will require some development if we require more features:

  *   Only generates CENC initData (not DRM-specific variants).
  *   No WebM support
  *   ClearKey, PlayReady, Widevine, DRMs only.  Notable missing DRMs  — Adobe Primetime, Apple FairPlay (both indicated as supported by CastLabs)
CableLabs has volunteered my time to support the integration/use of these tools if we think they will be valuable.

G

On 6/23/16, 10:35 AM, "Mark Watson" <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:



On Thu, Jun 23, 2016 at 12:01 AM, David Dorwin <ddorwin@google.com<mailto:ddorwin@google.com>> wrote:
For Blink, we tried to follow our understanding of the WPT style, which was that each test case be a separate file. In some cases, especially the syntax tests, there are multiple categories of tests together. I think readability is also important, even if it means duplication. (Of course, API changes or refactorings can be monotonous when they have to be applied to many files, but that should be rarer now.) As to which approach we take for new tests, I defer to the WPT experts.

I think we probably do want individual tests for various media types, etc. For example, downstream users (i.e. user agent vendors) should be able to say "I know I don't support foo, so all the "-foo.html" tests are expected to fail. For tests that aren't specifically about a type (or key system), the tests should select a supported one and execute the tests.

Certainly, there need to be individual tests, but a single file can contain several tests. The test page reports for each file how many of the tests within passed and how many failed. In WebCrypto we have a file with 20,000 tests :-)

However, I do like that the blink tests are small and easy to read. Another reason is that the WPT framework has a 60s timeout for any given file. Since it takes a few seconds to start and verify playback, we can't have too many tests in one file unless we can adjust this timeout.

Ideally, we need to generalize on at least 5 axes, either by generalizing the tests as they are, or by creating new files with the different versions of each test:
- test all the media types the browser claims to support
- test all the initData types the browser claims to support
- test all the session types the browser claims to support
- test all the key systems the browser claims to support
- for cenc, test both keysystem-specific and common format initData

We do not need to test every possible combination of the above and we don't need to run every one of the existing blink tests for each of these combinations. But it is not straightforward to work out which combinations we do need and which tests need to run on multiple combinations.

We perhaps need a utility function which calculates which combinations of the above a browser claims to support (as a subset of the combinations the test framework supports). There would then be one test which looks at the supported combinations and checks it is non-empty :-)

The list of supported combinations would then be an input to at least some of the other tests, which would then test each combination individually.


Ideally, it would be possible to force such tests to run all supported variants. For example, Chrome might want to run the tests with both MP4 and WebM. encrypted-media-syntax.html, for example, tries both WebM and/or CENC types based on whether they are supported, requires all supported to pass, and ensures that at least one was run. This has the advantage of testing both paths when supported, though it's not verifiable anywhere that both ran. I don't know whether it would be useful to be able to say run all the tests with WebM then repeat with CENC.

Regarding the test content, it would be nice to use a common set of keys across all the tests and formats. This will simplify utility functions, license servers, debuggin, etc. Also, we may want to keep the test files small.

For our part, we don't have a workflow to easily package content with a specific key / key id. There is test mp4 content, cropped to ~10 seconds, in the branch linked below. Do you have a way to create a WebM file with the same key / key id ? I guess we could then hard code all the Clear Key messages.

...Mark



David

On Tue, Jun 21, 2016 at 9:16 PM, Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:
All,

I have uploaded some additional EME test cases here: https://github.com/mwatson2/web-platform-tests/tree/clearkey-success/encrypted-media


I have not created a pull request, because there is overlap with the Blink tests.

I have taken a slightly different approach, which is to define one function, eme_success, which can execute a variety of different test cases based on a config object passed in. There are currently only four: temporary / persistent-usage-record with different ordering of setMediaKeys and setting video.src, but it is easy to add more with different initData approaches, different media formats and different keysystems.

What approach do we want to take ? The Blink approach of a different file for every individual case will bloat as we add different session types, initData types, media formats and keysystems.

On the other hand, each of the Blink test cases is very straightforward to follow, whereas the combined one is less so.

My branch also includes some mp4 test content, the key for which is in the clearkeysuccess.html file.

...Mark
Received on Friday, 24 June 2016 14:39:50 UTC