Minutes from W3C M&E IG call: Media Playback Quality, Autoplay Policy Detection, Bullet Chatting from Chris Needham on 2019-08-14 (public-web-and-tv@w3.org from August 2019)

From: Chris Needham <chris.needham@bbc.co.uk>
Date: Wed, 14 Aug 2019 07:09:31 +0000
To: "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Message-ID: <590FCC451AE69B47BFB798A89474BB365F0301C1@bgb01xud1006>

Dear all,

The minutes from the Media & Entertainment Interest Group call on Tuesday 6th August, where we discussed Media Playback Quality, Autoplay Policy Detection, and Danmaku / Bullet Chatting are available [1], and copied below.

Many thanks to Chris, Becca, and Song for presenting.

A recording of the conversation is also available [2].

Kind regards,

Chris (Co-chair, W3C Media & Entertainment Interest Group)

[1] https://www.w3.org/2019/08/06-me-minutes.html
[2] https://media.w3.org/2019/08/meig-2019-08-06.mp4

W3C
- DRAFT -
Media and Entertainment IG
12 Aug 2019

Agenda
Attendees

Present
Kaz_Ashimura, Yajun_Chen, Chris_Needham, Will_Law, John_Luther, Larry_Zhao, Xabier_Rodriquez_Calvar, Barbara_Hochgesang, Greg_Freedman, Chris_Cunningham, Becca_Hughes, Tatsuya_Igarashi, Mounir_Lamouri, Jeff_Jaffe, Mark_Watson, Song_Xu, John_Riviello

Regrets

Chair
Chris

Scribe
kaz

Contents

Topics
Agenda
Media Playback Quality
Autoplay Policy Detection API
Bullet curtain
TPAC agenda
Next call
Summary of Action Items
Summary of Resolutions

# Agenda

ChrisN: A few topics today, we've been doing a series of calls looking at the APIs in scope of the Media WG.
.... The two main ones not covered so far are Media Playack Quality and AutoPlay Policy Detection.
.... Can also mention planning for the IG F2F meeting at TPAC. Anything else?

Song: Can I present the bullet curtain proposal? Can wait until the next call, though.

ChrisN: Yes

Jeff: I wanted to mention that Mark Vickers would like to step down as co-Chair after TPAC.
.... I've had good dialog with current chairs about new co-chairs, have some good candidates. If anyone here also would like to suggest someone, please reach out to me.

<Barbara_H> Should we review Media Capture in this group at some point?

# Media Playback Quality

ChrisC: I've recently taken a closer look at this API. Google has been talking about how to do better reactive playback quality signals for a while, now scheduled some time to look at it.
.... The API is very simple, the most useful property is the number of dropped frames relative to the number of decoded frames.
.... Sites like YouTube, Netflix, and others use this to adapt between bitrate quality levels to ensure not to many frames are dropped, for a good experience.
.... The API is shipped in most major UAs, with the notable exception of Chrome.
.... This is about prioritisation, and some concerns about the spec. The API isn't perfect, the definitions in the spec are coarsely written, e.g.,
.... what is a dropped frame, exactly? We can drop frames for different reasons, are some more important than others?
.... That said, everybody's shipped it, and there's a prefixed dropped frame attribute and total frames attribute in Chrome.
.... Websites largely treat these as interoperable, but there's some evidence that they're not. We should make this true, if it's not currently true.
.... In the next quarter and beyond, our priority is to get this shipped, it's unpleasant that it isn't.
.... A prefixed version is shipped in Chrome, something we'd like to eliminate.
.... I'm planning to spend some time on the spec, clarify the definitions, and work with other UAs to make sure it's interoperable.
.... For dropped frames, it's about performance. It wouldn't be right to count a frame as dropped if the machine had it ready in advance, but the monitor refresh rate is so high that you never saw the frame.
.... Would be a poor choice for the site to adapt to that condition.

Kaz: Any resource on the API?

<chcunningham> https://w3c.github.io/media-playback-quality/

ChrisC: Another property we want to clarify, or remove altogether, is corrupted frames.
.... It's not something we've been able to implement in Chrome.
.... If the video is corrupted, you'll generally get a decode error. If a single frame is corrupted, you may never know about it.
.... The end user may see artifacts, but the next i-frame will come along and fix the corruptin, or there may be a decode error that terminates the playback.
.... So there isn't really a situation in Chrome where we could accumulate a count of corrupt frames.
.... Not clear what that would mean to app developers.
.... I'll nominate to cut this from the spec.
.... We'd also like to make some additions and some ergonomics changes, mostly backwards compatible.
.... This API is poll based. I did a quick survey in the video-dev Slack group about what players are using, what polling interval.
.... Seems to be largely in the order of 1 second, but we don't have a lot of data. Polling is unpleasant.
.... We're open to discussing adding an event when the dropped frame count increases, which is hopefully less than once per second.
.... Alternatively, something like the performance observer API where you could subscribe to changes you're interested in.
.... We can't raise an event on every change, as the decoded frames total is constantly changing.
.... Safari added a display composited frames field, which are frames rendered using the hardware overlay path.
.... It's not in the spec, people have asked for it in Chrome, seems useful and straightforward to add.
.... Other things from FOMS last year: signalling the frame rate, active bit rate, active codec.
.... This would be helpful for sites to adapt. Would be nice for the API to surface at a clean boundary when the stack has reflected the adaptation.
.... Codec is tricky, Chrome is increasingly strict about passing in valid codec strings with canPlayType, to give unambiguous answers, and Media Capabilities.
.... It's hard to produce a codec in the reverse direction. A valid codec string with profile and level, we know large parts of that.
.... For MP4, could do that reliably, but tricky for every container everywhere.
.... Not unsolveable, it's something we need think about. The changing of codec at this point is less important than those other things.
.... Another thing is decoder health signals. In the HTMLMediaElement spec, there's readyState: HAVE_CURRENT_DATA, HAVE_FUTURE_DATA, HAVE_ENOUGH_DATA.
.... Not clear to me what these mean in terms of being about network, or if my computer has decoded enough frames to pre-roll and display the data.
.... Seems to be mostly about network. It's described in the spec under the resource fetch algorithm.
.... This means there aren't clear signals for when the decoder is not keeping up.
.... In Chrome, HAVE_CURRENT_DATA is signalled either in the network condition or in the decoder condition, so it's a conflation of things.
.... It would be interesting to have a separate signal to say if your computer is not able to decode enough frames and the video is frozen on the screen for a moment, this is an underflow caused by the system. It may or may not be reflected in an increment to droppedFrames, but would be an interesting signal for adaptation.
.... Another is time to decode. If your frame rate is 30 frames/second and you're decoding slower than that, another interesting signal to consider for adapting.
.... Those are my high level plans. Any questions or proposals for things to add to the API?

ChrisN: I have a question about dropped frames, does this become a proxy for CPU usage on the device, if it can't keep up with decoding and rendering?

ChrisC: Yes it is, sort of. There are some cases where the CPU may be quite high and you're not dropping frames, so not a perfect proxy, but it shows that the system is under strain.

ChrisN: Thinking generally about QoE measures, inability to fetch the resource, or buffer underruns, things that stall the playback.
.... There are other APIs we can use to get that kind of information, so not everything needs to come from Media Playback Quality?

ChrisC: This is true. If you're using MSE, you have a full picture of how the network looks, regarding fetching the chunks.
.... There's the readyState on the HTML video element. I believe this to be more a resource fetch and network thing,
.... although in Chrome these are a mixed up, not sure what other UAs have done.

ChrisN: Anyone else have thoughts or things you want to see from this API?

(none)

ChrisN: Looking back, there's a WHATWG page about video metrics

<cpn> https://wiki.whatwg.org/wiki/Video_Metrics

ChrisN: This has a survey of metrics available from different media players at the time.
.... This links to a couple of bugs, which were referred to the web performance group,
.... as more general network type of issues.
.... Not sure to what extent this list is still relevant today, but may be worth looking at again.

ChrisC: That's definitely true. It has some things we have today, some things we don't.
.... I see some definitions of bitrate here, start up time.
.... Some of them could be hard to do, like latency, where the app knows better than the UA.
.... But this is a good resource, still relevant.

ChrisN: My next question is about codecs. There's a negotiation between the web app and the UA,
.... using Media Capabilities API to understand what codecs are available,
.... and what's compatible with the content I want to fetch.
.... What role do you see with Media Playback Quality, e.g, is it where the UA makes adaptation decisions?

ChrisC: We see them working in tandem. With Media Capabilities, you'll get a clear answer about what codecs are supported.
.... But as many codecs are decoded in software, or as new codecs come out, like AV1, push the limits of what the system can do,
.... decoders are being optimised, the proliferation of 4K, etc.
.... Media Capabilities is not a guarantee. We say the codec is supported and will be smooth.
.... But the smooth part of that claim is a prediction. It's possible the user will begin playback of something that has historically been smooth,
.... but then alo begin to play a resource intensive video game, and this would affect video playback.
.... We encourage people to use the Media Capabilities API to know up front what the limitations are, what codecs might be power efficient on the device, so they can do the optimal thing for the user.
.... Then we engourage players to listen for these reactive metrics to see if anything about the predictions have changed.

ChrisN: I understand that this was part of MSE, then dropped because of lack of interoperable implementations needed for a W3C Rec.

ChrisC: That's right. It's good that it was dropped, as it's something that people should be aware of regardless of Media Source Extensions.
.... It does feel at home on the media element.

ChrisN: And so it could be applied to other video resources, e.g, getUserMedia. Is it hooked up there as well?

ChrisC: Absolutely. I think all the same notions apply, you could be dropping frames. Also the new thing from Apple, could be applied.

ChrisN: Any other comments/questions?

Barbara: The relationship with Media Capabilities, any thoughts on how those two would work together?

ChrisC: Those two specs are totally separate. Internally in Chrome, we use droppedFrames as a historical signal for what your capabilities would be.
.... The Media Capabilities doesn't require that, it allows implementations to use whatever heuristics they like. Firefox and Safari probably do something different than Chrome.
.... There could be an opportunity for the specs to reference each other, for certain definitions. For example, the Media Playback Quality may surface when a decode is hardware accelerated, something Media Capabilities already does.

Barbara: From a hardware vendor's perspective, thinking about changing the configuration based on information either from the playback or media capabilities, on which hardware decoder to be using.

ChrisC: Media Capabilities has a isPowerEfficient property, which is not quite the same as being hardware decoded, but is meant to be effectively the same thing, for developers who want to save battery.
.... If you're just concerned about performance, then use smoothness.
.... The reason it's not exactly hardware decoded, and described as power efficient, is that at lower resolutions and lower frame rates all video is more or less the same in terms of efficiency, whether it goes through a hardware or software.
.... Media pipelines below a certain resolution cutoff will opt to decode in software, even if they have the hardware resources.
.... It could be interesting to define some of these terms in one of the specs, and cross reference.

Barbara: I'm glad to see the both specs are done within one Working Group.

Will: The Media Playback Quality interface has absolute count of frames. That means anyone who wants to monitor in terms of frame rate or dropped frame rate has to implement some sort of divide by time or clock.
.... It would be useful if the API could provide dropped frame rate, and the UA keeps track of the frame rate. Often it's the frame rate that would trigger a switch, rather than the absolute count.

ChrisC: That's a great point. We'd have to figure out a good boundary for a window for the running average, assuming you don't want it windowed over the entire playback.

Will: Correct. A reasonable one would be 1 second, and get feedback from player developers?
.... More importantly, how that correlates. Typically you want that as a quick predictor to be able to react,
.... so I'd prefer a shorter time window that has more variance as you play through, but you can use it as a switching metric, which I believe is the intent.

ChrisC: I like it.

ChrisN: What's the best way to send feedback, as issues in GitHub?

ChrisC: Yes, or you can contact me personally.

# Autoplay Policy Detection API

Chris: Would you like to give an introduction the background, and the problem this API intends to address?

Becca: I'm a software engineer at Google, been working on autoplay for a while.
.... Right now, there's no way for a website to detect whether it can autoplay before it calls play() on the media element.
.... We want to give sites a way to determine whether a video is going to play or not.
.... This is so they can pick different content or change the experience. For example, they might want to mute the video, or do other things.
.... The API is relatively simple. There's something at the document level that determines the autoplay policy.
.... This can either be: autoplay is allowed, muted autoplay is allowed - in that case the site may want to change the source or change the UI - or autoplay is disallowed and it will be blocked.
.... There's another check at the media element level, which returns a boolean.
.... This is designed so that you can set the source on the media element, then call whether this will autoplay.
.... This will look at the document autoplay policy, and the metadata on the source, e.g, is there an audio track? Then it will return whether it can autoplay, before you call play().
.... We're still working out the ergonomics of the API. I'd be happy to take questions or suggestions.

ChrisN: Could you please expand on the difference between the document level and media element level?

Becca: The document level gives you the auto play policy: allowed or disallowed. At the media element level, it's taking into account the source.
.... Some browsers have some autoplay restrictions at the media element level, so this will take those into account.

ChrisN: And so the purpose is to allow the web app to make decisions on what content to present based on the policy that's applied.

Becca: A common use case is if autoplay is allowed but only for muted content, some sites may want to switch the source and play different content.
.... Or update the UI and show something to the user, to show autoplay is blocked.

ChrisN: I remember when the autoplay restrictions were implemented a couple of years ago, it was controversial,
.... e.g, in the Web Audio community where they were programmatically generating audio content.
.... This doesn't apply just to media coming from a server, also synthesized content from Web Audio.
.... Becca: Yes, the document level autoplay also applies to Web Audio. We don't have a plan to add a method to AudioContext,
.... but you can use the document level API to check before creating an AudioContext.
.... Any questions?

Mark: You mentioned two cases: a media source with no audio track, and a site attempting to play video with an audio track but the media element is muted.
.... Are those cases considered differently for autoplay policy, or are they equivarant?

Becca: They should be essentially equivalent.

Mounir: Not all browers actually check for the audio track. So if you want to be cross browser compatible you may just want to mute instead of removing the audio track.

ChrisN: And the circumstances where autoplay is allowed will vary between UAs, as they can be using different criteria.

Becca: That's correct, and why we recommend sites use this API rather than trying to predict what the policy is.

ChrisN: As a developer, is there a way that I can test how my site behaves with each different policy level applied to it?

Becca: Right now there is no way to test that, but it's something we can consider adding, I think.

ChrisN: We had difficulty with our player component, which was in an iframe, and the user interaction in the containing page,
.... and we wanted to send a message to the iframe to tell it to play, and this was blocked by autoplay.

Becca: For Chrome, there's an autoplay policy command line switch that you can use to set different autoplay policies.
.... For example, one of these is "no user gesture required", which would allow anything to autoplay.

Greg: How is the media element level API different to calling play() and having it reject, and then reacting to that?

Becca: It's kind of similar. Autoplay allows you to check before calling play(), so you can change the source or make a change in the UI.

Greg: I could call play(), then if it rejects I could mute then call play() again. Or, if it rejects, we'd leave up a still image.
.... Also, IIUC, this API requires you to have metadata, so it seems you have to do the work to get ready to play, so it seems similar to calling play() and seeing what happens.

Will: Feature detection vs error is never a robust architecture. While you can call play() and see that it doesn't, there are other reasons it may not play than autoplay.
.... So I think it's a cleaner implementation if there's an explicit API to figure out the autoplay vs relying on error conditions on a play request.

Greg: Does play reject for any other reason?

Will: I don't know for sure, but it might in future, so your assumption today becomes brittle whereas an API explicitly about autoplay gives a clearer picture.

Mounir: One reason is specific to Safari on iOS. They have a rule that autoplay is per-element, so we want to be mindful of that, and have a per-element API.
.... What a lot of websites do is create an element, try to play using it, the element has no audible track, or no source. If it rejects, they know it can't autoplay. Then they create the real element after that.
.... That can be done, and would work as well as the media element API, but on Safari on iOS, not a good way of doing things, as the second media element you create would not be allowed to play because of the user gesture.
.... The document level API solves other problems, but the media element level API is there to solve this Safari iOS issue.

ChrisN: What is the current status? There isn't a draft specification on GitHub, is this being worked on at the moment?

Becca: We're still working on it, there's discussion on the shape of the API, and we'll discuss at TPAC and hopefully resolve.

ChrisN: I saw some discussion on whether the API should be synchronous or asynchronous, don't need to go into that now.

Mark: There is an explainer document in a branch.

Becca: Yes, it's in a pull request

<cpn> https://github.com/w3c/autoplay/blob/beccahughes-explainer/explainer.md

Mark: If the document level API says that autoplay or muted autoplay is allowed, is that a guarantee it's allowed, or do I also need to check the media element API?

Becca: If the document level API says it's allowed, you should not have to check at the media element level.
.... We're thinking of adding a fourth state at the document level, which is unknown, so please check the media element level.

Greg: Would the media element api detect that it's in a user gesture?

Becca: Yes, it should be able to.

ChrisN: Any further questions?

(none)

ChrisN: Thank you Becca, this was really helpful. We can send feedback on the GitHub repo?

Becca: Any issues are welcome

https://github.com/w3c/autoplay autoplay repo

ChrisN: Also please join the Media WG as well.

# Bullet curtain

<Song> https://w3c.github.io/danmaku/index_en.html

ChrisN: Can you give an introduction, as people might not be familiar?

Song: There's a description in the introduction. The name comes from the Japanese word "Dan-maku".
.... After discussion, we chose "Bullet Chatting" as the official name.
.... It's for dynamic comments floating over a video or static image, at a specific point in time in the video.
.... It brings an interesting and unexpected experience to the viewer.
.... There's a picture...

Kaz: (suggests Song share his screen on WebEx)

Song: Figure 1 shows a typical bullet chatting screen. There's text floating over the video, which makes watching the video more fun.
.... We did some research over several solution providers, to compare the attributes and properties they use, appearance, time duration, font size, colour, timeline, and container.
.... (1.2) As characteristics, there's the independence of space, deterministic rendering, uniformity of modes.
.... The important feature of bullet chatting is that it's quite flexible compared to traditional comment presentations.
.... (1.3) There are four modes of bullet chatting: rolling, reverse bar, top and bottom, basically four directions for the text.
.... For example, rolling mode is the most used mode, which scrolls the text from the right to the left.
.... (1.4) Regarding the commercial operation of bullet chatting, here are some figures. For example, iQiyi is top video service in China, they have 575 million monthly active users.
.... For Niconico in Japan, the usage is also high. Every service provider listed here provides bullet chatting for every player.
.... The functionality can cover lots of subscribers of the movie content.
.... (2) WebVTT and TTML are relevant to the implementation of bullet chatting.
.... (3) As background for the use case, Figure 2 shows the typical chat room, the comments scroll on the right hand side at a fixed speed.
.... Figure 3 shows bullet chatting, where we present the comments on screen..
.... The advantage of displaying the video with bullet chatting is in typical chat room, the messages scroll quickly.
.... For bullet chatting, there's a higher density of information because it's presented over the full screen video, so there's a wider display area, which gives a better user experience reading individual messages.
.... In a normal chat room, every message scrolls up at the same speed, so it's hard to do some specific handling, but with bullet chatting mode each message moves separately with their own paths and update frequency so it's possible for users to read the comments.
.... (Figure 6) In a typical chat room, if you're watching the video content in the player, it's difficult to concentrate on both the video area and the comment area.
.... In the bullet chatting mode, you don't need to move your eye from the left to the right.
.... There's also some advantage for the reading habit, as with bullet chatting the moving direction is right to left, and for people who have a habit of reading left to right it can be more convenient for them to read and understand the whole message.
.... In common understanding, having text floating over the video can be distracting for the user. But another perspective from social psychology is that watching a video alone gives a feeling of joining a group activity, and placing the comments over the video will make the user more cheerful.
.... Bullet chatting can show the text and video together in a comfortable way, with a sense of participation in a group activity for the user.
.... Without moving his sight from the video content, the user is able to read other's comments on the specific scene or upcoming clip. This increases everyone's social presence, particularly for millenials.
.... (4.1) The comment content can be used for on-demand video interaction, or live streaming interaction.
.... (4.2) The chatting can be a direct interaction between the anchor and the viewers in a live stream.
.... (4.3) The use of the bullet chatting data, if there lots of comments at a specific point in the video, this indicates a lot of users are interested in that specific point of the video, so this can be used for data analysis of consumer behaviour.
.... (4.4) It can also increase the effect of the video content.
.... (4.5) Another usage is for interaction within a web page. For example, with WebVTT you watch the comments over the video, but with bullet chatting, it can be simple for any webpage.
.... (4.6) With an interactive wall, the host can present comments from attendees on a wall. In this case there's no relationship between the comments and the video. From our understanding, this is a case that WebVTT can't cover.
.... (4.7) Masking is being used to avoid conflict between different comments. The comments will avoid overlapping the people on the screen.
.... (4.8) Another example non-text bullet chatting, for example, emojis.
.... (5) There's a recommended API. I won't go into the details. Anyone who's interested can read the details in GitHub.
.... (6) A gap analysis of bullet Chatting and WebVTT. WebVTT is a file format for making external text track resources, which is kind of a fixed format. Bullet chatting is more flexible to provide text over the video content.
.... Considering bullet chatting as a subset of WebVTT, there are difficulties, such as the interaction and tracking. The WebVTT formatting is done in a fixed way, so we think bullet chatting could be a separate design rather than a subset of WebVTT. That's the main result from this proposal.

ChrisN: Thank you. This has really progressed since the last time we spoke about this.

Song: There are some other companies involved in this proposal, the video service members of W3C.

ChrisN: What's the next step? Do you want feedback from other IG members?

Song: If the other people are interested, we plan to collect issues and use cases in GitHub.
.... We can make a proposal based on the use cases, completed by other members.

ChrisN: I would suggest more detailed discussion with TTWG members, as they develop WebVTT, so that's a good place to raise those things.

Song: I agree.

ChrisN: Thank you for sharing this, good to see the progress.

<cpn> scribenick: cpn

Kaz: Is there any issue with the rights in the pictures used (figure 10, 11, etc)? We should use content right free pictures.
.... When Angel Li from Alibaba gave a presentation on this at AC meeting in Quebec in April, I mentioned Niconiko's work from Japan, also EmotionML for additional emotion related information.
.... This is a mixture of a timed text format, with positional and emotion information. It's a kind of extension to WebVTT.
.... I agree with Chris, to talk to TTWG, but also look at EmotionML, JSON-LD and other semantic notation.

<kaz> scribenick: kaz

https://www.w3.org/TR/emotionml/ EmotionML spec

Song: I understand your feedback. We'll speak with the TTWG members, to see if we can make it a subset of WebVTT or a new API.
.... We'll clarify the copyrights. Thanks for the reminder.

ChrisN: Thank you. I look forward to continuing this conversation.
.... Are you planning a meeting at TPAC for this?

(Discussion of meeting possibilities at TPAC: break out session, meeting with TTWG)

# TPAC agenda

ChrisN: I am currently working with the IG co-chairs on the agenda for the F2F meeting at TPAC.
.... The timing of TPAC isn't ideal, due to overlap with IBC, so not everyone can come.

<cpn> https://www.w3.org/2011/webtv/wiki/Face_to_face_meeting_during_TPAC_2019

ChrisN: We want to use some time in the afternoon for open discussion on future directions for media on the web.
.... Look ahead, think about use cases and requirements for future capabilities.
.... If you are coming to TPAC, please take a look at the agenda. This is our meeting, so please let me know if you have suggestions.

# Next call

Kaz: September 3?

ChrisN: Yes, again please let us know about topics.
.... Chris and Becca, thank you for your presentations!

[adjourned]
Summary of Action Items
Summary of Resolutions

[End of minutes]
Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2019/08/12 07:37:50 $

Received on Wednesday, 14 August 2019 07:12:41 UTC