Minutes from W3C MEIG monthly call 3 November 2020: CMAF, MSE, and Media Capabilities API from Chris Needham on 2020-11-03 (public-media-wg@w3.org from November 2020)

From: Chris Needham <chris.needham@bbc.co.uk>
Date: Tue, 3 Nov 2020 17:34:26 +0000
To: "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
CC: "public-media-wg@w3.org" <public-media-wg@w3.org>, "dpctf@standards.cta.tech" <dpctf@standards.cta.tech>
Message-ID: <590FCC451AE69B47BFB798A89474BB367CCAFA74@bgb01xud1006>

Dear all,

The minutes from today's Media & Entertainment IG call on CMAF, MSE, and Media Capabilities API are available [1] and copied below.

Thank you to John Simmons, Matt Wolenetz and Chris Cunningham for leading the discussion, and to all for joining.

The main action items for Media Capabilities API were summarised at the end of the call:

* Chris Cunningham will prepare an explainer showing how to use the Media Capabilities API and CSS Media Queries and Screen object together.

* Using the explainer, media companies (e.g., via the WAVE project) can look at the CMAF media profiles and see how those map to API calls (and possibly produce a polyfill library).

* Once that's done we can schedule a follow up call to discuss the outcomes, e.g., any possible API gaps. This could be a MEIG meeting or a CTA WAVE meeting.

Thanks,

Chris (Co-chair, W3C Media & Entertainment Interest Group)

[1] https://www.w3.org/2020/11/03-me-minutes.html

---

W3C
- DRAFT -
Media & Entertainment IG
03 Nov 2020
Agenda

Attendees

Present
Andreas_Tai, Barbara_Hochgesang, Chris_Cunningham, Chris_Needham, Franco_Ghilardi, Francois_Daoust, John_Riviello, John_Simmons, Kasar_Masood, Kaz_Ashimura, Louay_Bassbouss, Pierre_Lemieux, Rob_Smith, Takio_Yamaoka, Tatsuya_Igarashi, Will_Law, Xabier_Rodríguez_Calvar, Yasser_Syed, Matt_Wolenetz

Regrets

Chair
Chris

Scribe
cpn

Contents

Topics
Summary of Action Items
Summary of Resolutions

<kaz> Agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2020Oct/0010.html

<scribe> scribenick: cpn

JohnS: Two issues are important. CMAF Byte Stream format, developed by WAVE, restrictions that should be respected for CMAF profiles not captured in ISO BMFF
.... When we started on Media Capabilities, we discussed CMAF profiles, decided to create a polyfill library to ask for profile support, translated into MCAPI calls
.... Looks like this approach may not be possible - certain capabilities can't be queried about, e.g., in-stream codec parameters
.... or because the profile parameter, which relates to the container, is not support in MCAPI
.... Had some discussion in the last week, decided it would make sense to have a WAVE / W3C joint meeting, to bring in the media experts involved from WAVE
.... Yesterday, we discussed why MPEG chose to identify media profiles to begin with: why use 4 character codes for media profiles?
.... I've done an analysis of the byte stream format, to identify what the byte stream format requires of the UA, that's not in the ISOBMFF byte stream format
.... Need to do a deep dive on that
.... WAVE has produced a spec, not public yet, which is how to do media capability reporting for CMAF media profiles. Need to discuss on that.
.... There needs to be a more regular dialog between the people working in WAVE, and the MEIG and Media WG
.... More regular dialog could help solve these issues
.... Summary: How to properly handle CMAF? It's becoming dominant for commercial content. How to handle with MCAPI? How to engage with WAVE and MEIG on more regular basis?

Matt: Thanks for organizing the meeting yesterday and doing the analysis between ISO BMFF spec and CMAF
.... You've done a good job to summarise the outcomes and next steps.
.... We want to make the web platform usable for CMAF, but also don't want to restrict it to where the support for CMAF is so specific that MSE implementations would say they don't support CMAF related queries
.... The ways of identifying the right set of things to have good support for CMAF.
.... There's granularity: query using 4 character codes, return a yes/no answer vs very granular queries, some of which may be too much for an app to deal with
.... We're not sure about the 4 char codes, need more detail on that and where they came from

JohnS: What's the right level of granularity? Most desktop playback players, you just try to play the content however you can. If you have to say no, even though you support most aspects, would be unwelcome
.... Issue of enhanced audio codecs that need in-stream codec parameters - would be a yes/no question
.... So some tests are yes/no, some are better. For 608 captioning, an app would want to know if it's handled, to use an alternate method instead
.... I'm not the expert on how the CMAF profiles came about, Dave Singer and Kilroy Hughes, or Cyril or Thomas Stockhammer could explain that
.... One idea from Chris C yesterday: the media profiles are useful on the encoding side, and we could test if you're compliant with the profile
.... For network efficient, don't want a combination of media segments. Could be less useful on the decode side, to create an efficient set of catalogs
.... That seemed it could be true to me. From a playback perspective, does the 4 character code tell you something useful about playback. Any input on that?

Will: The profile sets maximum constraints. If you support the maximum, you support everything in it. The 4CC codes are essentially a label for characteristics, expressed as bounds
.... The content may not require the maximum itself
.... I want to solve the problem of a player inspecting the environemnt to see if it can play sets of content?

JohnS: It has the color primaries, transfer characteristics

Will: But those aren't in the playlist. The DASH/HLS manifest should be sufficient to decide if it can play a piece of content

ChrisC: Thinking about it as a bound, this guided us towards the polyfill solution. You could query using that bound to get a yes/no answer
.... The issue that came up yesterday is that there are some aspects of the container that MCAPI has no way to support, e.g., this quirk of CMAF
.... We could add to MCAPI, but is it a good idea? We have a stable fixed definition in MSE so far, not had lots of issues come up for fMP4
.... For the sake of CMAF to be friendly to users and UA, you'd want that model for CMAF also.

JohnS: One of the big issues is that if you look at a CMAF media profile (e..g., a vanilla profile, AVC, and one that's more esoteric, e.g., enhanced audio) - what traits are specific to the CMAF media profile that would be required to be supported for the UA to render the content as intended?
.... For example, there's a bitrate range or resolution range specified, the app would want to know if the upper and lower bounds are supported
.... They want to know which format to switch to, depending on the traits
.... Are there container specific traits, how CMAF requires fMP4 to be constructed that go beyond

Matt: Placement of emsg boxes can vary, but for MSE they need to be collocated in the media segment, to get deterministic parsing, timestamp handling
.... The handling depends on where the box is in relation to the media segment. The details need to be locked down a bit more

JohnS: The TPAC discussion on where emsg boxes could reside - at top level in media segment. One of the things I mentioned is that emsg is defined by DASH, not ISO BMFF. It's now in CMAF, but it's never been a base level box in part 10.
.... Can you put it in the init segment, or can it be at the chunk level, clearly needs to be normalised
.... Media profiles are orthogonal to emsg boxes. It could be in any track, not just the video

Will: It is being used. Want to avoid parsing in JS, can be inserted in any MOOV atom

Matt: Should the content of the emsg be better represented out of band?
.... MSE inherently allows relocation in time of segments. The emsg could be meant for 10 seconds in the future, so at what time should the emsg fire?

Will: There are two modes: trigger now, or inform of impending change in the future. This is a good case for being out of band. There are good reasons for putting them inband, it's one less request a client needs to make

Pierre: On the question of profiles, they're for the sender and receiver to agree on capabilities that need to be support. It seems the profiles aren't adequate to communicate that
.... There's no agreement in the industry on what's a reasonable set of capabilities for players? It could be hopeless to try to parameterize all the options

JohnS: It used to be that you'd say: the client device must have the following capabilities, closed ecosystem. We're moving to a world with feature capability detection to figure out what the device can do.
.... We don't have a lot of experience with the feature detection approach. It feels like an intractable problem, because you don't want too much granularity, so you have to ask a huge checklist of questions
.... vs having a 4CC which you can query about
.... We haven't found the right level of granularity to ask those questions, which is what we're trying to sort out
.... We want to come up with the set of 6 or so questions that are yes/no

ChrisN: We filed an issue for chrome and firefox to support emsg yeas ago

Pierre: emsg is just one aspect, e.g., ICPCP, v2, HLG, what's the target peak luminance? Similar questions for audio. How do we parameterise it?
.... Or pick some constrained profiles

JohnS: That was part of the reason why CMAF media profiles were created, to be able to ask these questions

Pierre: Yes, they're for exactly that. The profiles in SC?? were too constrained, not constrained enough. Need to get the industry together again to figure it out

ChrisC: You mentioned it might be folly to break the profile into its constituents, as there could be too many. But if we don't do that, and if MCAPI understands 4CC, could still be folly
.... Under the hood, you still need to understand all the variants. You want maximum support everywhere, one flavour of CMAF to encomplish that
.... There's been some exploration of what profiles would look like, would love to see the polyfill, a flat list of the profiles and what they need
.... From what I've see it may not be constrained enough

Yasser: Do manifests and playlists play a role, or is it just about the CMAF content?

Will: If it's not exposed at the playlist/manifest level. it would have to load an init segment or some content and expose it ot the browser to ask if it can be played. Wastes time
.... Packaging process should expose the information at the playlsit level, and that should be sufficient, to be most conveneitn. We shouldn't have to actually load the content to see if we can play it

JohnS: Agree with Will. That was sort of the intent, use the manifest data to query the UA
.... So how to do that? What should be in the manifest for the UA to determine if it can play?
.... It's not just about asking if you can play the codec. There are other essences that need to be asked about. What experience do people have with the adequacy if what's in the manifest? Is there a problem there, e.g., color space for example?
.... Has the DASH-IF has looked at the MCAPI?

Will: We looked at it over a year ago, there's been additions since then, not sure if a gap analysis has been done

JohnS: I'm hoping the WAVE MC spec can be shared soon, so we can have a call where we go through the analysis - is it adequate for the polyfill?
.... In the CTA WAVE spec, we've identified the primary media profiles, a formal process to approve what gets in the spec. There are maybe 12 video profiles and 8 audio profiles
.... It summarizes the characteristics of each profile in a table
.... For a follow on meeting, between WAVE, MEIG, Media WG, we want to have Apple, MS, Mozilla participating. Would be useful to put together an agenda for discussion
.... Should include reviewing MCAPI, the CMAF byte stream spec
.... The objective should be: what does the media stack need to change to support CMAF content? What are the unanswered questions?
.... Also to have more regular conversation between the groups
.... I'll work at WAVE to get the Media Capability document released. The review draft of the CMAF bytestream spec may have changes incoming (Zach Cava), send to this group, and set up a meeting to discuss the meeting soon, before the holiday season

Matt: I agree with that. Is a text based document enough to ask what the platform can play. May want to include some optional capabilities. DataCue may be unsupportable on some platforms, where stream would otherwise be playable, let the app decide if the platform is sufficient

John: I don't think we have a document that states what's optional and required for the CMAF profile. It's not just limited to CMAF, so would be a good thing to bring up at DASH-IF, also with Roger Pantos at Apple for HLS

ChrisN: Also constraints on the output channel, e.g., HDMI

ChrisC: The separation of responsibilities is that MCAPI is for what the UA understands separately from display capabilities. Those are handled by CSS, CSSOM View, we've landed some spec changes for those cases

JohnS: We could create some test examples, e.g., this display resolution, with this codec, HDCP 2.2 - can you handle that specific use case?
.... If we have those questions, we can test against MCAPI, to see how we might change the API if it's not yet supported

ChrisC: MCAPI only has an editor's draft so far. That draft has those additions. The design of those was hashed out between MS, Apple, Google, Netflix, lots of back and forth.
.... Should meet your needs for questions on HDR, but interested to learn about any gaps and address them
.... For questions about the display, it's not handled by the MCAPI, as it's about the screen not the decoding. We're working to surface properties there, also media queries for resolution
.... We've added dynamic range with 'standard' and 'high' values, in the CSS spec, not implemented yet

JohnS: How does some one know which specs to look at to know?

<kaz> https://w3c.github.io/media-capabilities/ Media Capabilities draft

ChrisC: It's not written down yet

JohnS: Would be great to have such a document
.... It also would become a useful explainer to help do capability querying

Matt: Would have to be a living document. EME has a proposal for key status in HDCP status to know if HDCP 1.0 is needed. That's in EME, but needed for a playback scenario

ChrisC: I'd be happy to produce an explainer document for the HDR pieces
.... The intent for MCAPI is to have parity with EME capabilities
.... Joey Parrish owns the EME spec
.... HDR capabilites can't be answered with requestKeySystemAccess. Should we continually add features to both specs, doesn't make sense, so let's advance that spec, for HDR properties there's no plan to add to EME, do it via MCAPI

JohnS: I can query, do you support this encryption mode, this codec, ask in combination?

ChrisC: Yes, that's a feature of MCAPI, implemented in Chrome. Need to know if in EME context, as the decoders can vary widely
.... EME will as is, for playback after the query. Don't want to revisit that spec

JohnS: What we can do as WAVE is create the list of queries we want to do? And compare with the explainer

Kaz: I agree with John's proposal to look at use cases and gap analysis for those scenarios
.... FYI, the WoT group is working on 3 specs: Thing description, scripting API, profiles of device description. Looking at these approaches could be useful for this, e.g., Thing Description might be useful for the discussion on possible extension for the manifest

<kaz> https://w3c.github.io/wot-thing-description/ Thing Description to describe the server's capability based on JSON-LD

<kaz> https://w3c.github.io/wot-scripting-api/ Scripting API to handle the description

<kaz> https://w3c.github.io/wot-profile/ Profiles of the server description for interoperability

ChrisN: So next steps: Chris C will prepare an explainer, showing how to use MCAPI and CSS Media Queries and Screen object together. In the WAVE project, we can look at the media profiles and see how those map to MCAPI calls, possibly produce a polyfill library. Once that's done we can schedule a follow up call to discuss the outcomes, e.g., any possible API gaps. Could be a MEIG meeting or a WAVE meeting

JohnS: I'll follow up with you on that.

[adjourned]

Received on Tuesday, 3 November 2020 17:35:15 UTC