Minutes from W3C M&E IG call: Subtitles in 360° Video and VR Experiences on the Web from Chris Needham on 2019-02-08 (public-apa@w3.org from February 2019)

From: Chris Needham <chris.needham@bbc.co.uk>
Date: Fri, 8 Feb 2019 17:17:44 +0000
To: "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>, "public-apa@w3.org" <public-apa@w3.org>, "public-immersive-web@w3.org" <public-immersive-web@w3.org>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <590FCC451AE69B47BFB798A89474BB363E287C5F@bgb01xud1006>

Dear all,

The minutes from the last Media & Entertainment Interest Group call on Tuesday 5th February are available [1], and copied below.

Thank you to Peter for bringing this topic to the Interest Group, and to everyone who joined. With such a level of interest, I look forward to continuing the conversation. Peter's slides are at [2].

There were a few action items from the meeting:

- M&E IG Chairs to contact the TTWG Chairs to ask about timelines and when input is needed
- Andreas to organise call with people working this area to determine next steps and discuss which topics are appropriate for which group (Media & Entertainment, APA, Timed Text, Immersive Web)
- Ada to talk to Immersive Web CG co-chairs and help to schedule this topic in one of their calls

Kind regards,

Chris (Co-chair, W3C Media & Entertainment Interest Group)

[1] https://www.w3.org/2019/02/05-me-minutes.html
[2] https://www.w3.org/2011/webtv/wiki/images/e/e5/20190205_W3C_ME_thoPesch_subtitles_in_360.pdf

W3C
- DRAFT -

Media & Entertainment IG

05 Feb 2019

Agenda
https://lists.w3.org/Archives/Public/public-web-and-tv/2019Feb/0000.html

Attendees

Present
Kaz_Ashimura, Andreas_Tai, Ada_Rose_Cannon, Ann_Marie_Short, Barbara_Hochgesang, Chris_Needham, Charles_LaPierre, Chris_O'Brien, Dee_Dyer, Eric_Siow, Charles_McCathieNevile, Francesc_Mas_Peinado, Francois_Daoust, Glenn_Goldstein, iedn, Jason_White, John_Luther, Karen_Singer, Kris_Anne_Kinney, Léonie_Watson, Mark_Hakkinen, Nigel_Megitt, Pierre_Lemieux, Steve_Morris, Tzviya_Siegman, Vladimir_Levantovzky, Will_Law, Michael_Cooper, Louay_Bassbouss, Peter_tho_Pesch, George_Sarosi, Mark_Vickers, Alexis_Tourapis, Atsushi_Shimono, Becky_Gibson, Irfan_Ali, Kazuhiro_Hoya, Samira, Yoshiharu_Dewa

Regrets

Chair
Chris, Mark

Scribe
cpn, tidoust, kaz

Contents

Topics
Intro and Welcome
Use cases for subtitles for 360 video
Discussion
Next steps
Perspective on M&E for the Web
Media Timed Events TF
IG rechartering

Summary of Action Items

Summary of Resolutions

<cpn> scribenick: cpn

# Intro and Welcome

Chris: The purpose of this call is to bring together the Media and Entertainment, Immersive Web, and Accessibility communities at W3C together to talk about captioning and accessibility in 360-degree video and VR experiences.

# Use cases for subtitles for 360 video

https://www.w3.org/2011/webtv/wiki/images/e/e5/20190205_W3C_ME_thoPesch_subtitles_in_360.pdf Peter's slides

Peter: (Slide 1) I would like to give a short introduction to the use cases for subtitles for 360 video.
.... The concept is also valid for VR and XR environments.
.... (Slide 2) This work is from a project called Immersive Accessibility.
.... Broadcasters are involved: CCMA from Barcelona, RBB from Germany.
.... I'll show the use cases and implementations, and give an overview of the standards we used, and what we found was possible for our implementation.
.... (Slide 3) Looking at 360 video, you can imagine it as a world map, on a globe.
.... For 360 video you are sitting inside the globe, looking out, seeing the texture of the globe.
.... (Slide 4) An image from a 360 camera, it's distorted, is wrapped onto a sphere as a texture.
.... You can look around that sphere.
.... You can watch on a tablet, move the tablet around to see different areas of the picture.
.... You can also use head mounted displays or glasses, which is a better way to present this.
.... (Slide 6) What's the best way to present subtitles in that environment?
.... We faced a few issues.
.... (Slide 7, 8) A basic approach, also used in YouTube videos and other platforms, the subtitle is placed in the field of view, as it would be on TV,
.... one or two lines of text at the bottom.
.... (Slide 9) [Demo video]
.... The subtitle always stays in the field of view, no matter where I look in the video.
.... This is a simple implementation, it's done in a lot of players.
.... When you're wearing a head-mounted display, there are a lot of things to consider around style and positioning,
.... to make the subtitles more comfortable to look at and read.
.... I won't go into that, it's more about styling and best practices rather than standards gaps.
.... (Slide 10) Next I'll introduce the issue of speaker identification.
.... (Slide 11) In traditional TV, it's usually done using colours, or sometimes by putting the name of the speaker in brackets.
.... These concepts work fairly well for television, but we found that it's more difficult to identify the speaker in a 360 environment, because there are so many places where the speaker could be.
.... Also, if there are several speakers, it's harder to relate the subtitle to the speaker.
.... We made three implementations, but two were tested more intensely.
.... (Slide 12) We tried two different modes to indicate to the viewer where the person speaking is located in the scene.
.... In the first, there's a small radar system visible on the screen.
.... The subtitle will always appear in the centre of the display.
.... The radar-like system indicates the current viewing angle, and a dot showing the speaker location.
.... You can have multiple dots, and the dots have the same colour as the subtitle.
.... You can see some people in the picture, none of them is the speaker as the speaker is behind us.
.... (Slide 13) In the second example, there's an arrow next to the subtitle that indicates that the speaker is to the left.
.... (Slide 14) [Example video recorded using head-mounted display]
.... As long as the speaker is within the field of view, the arrow is removed.
.... The arrow is shown when the speaker is not in field of view.
.... Those were the first things we tried and tested.
.... In our first test, people preferred the arrow system mostly, some liked the radar system.
.... The responses for adding one of those two possibilities were found to be helpful.
.... (Slide 15/16) There are other approaches. Here's a link to some BBC research from 2017. They also tried four different ways of presenting subtitles in 360.

https://www.bbc.co.uk/rd/blog/2017-10-subtitles-360-video-virtual-reality-vr BBC research blog post

Peter: The one shown here is the second most used way of placing subtitles, where you can burn the subtitles as text into the video, and you don't need to deliver them separately.
.... (Slide 17) This is also done by New York Times. Here's an example, you can see the text looks to be within the image, the text is burned into the video, so there's no way of activating or deactivating it.
.... (Slide 18) There's research from Ludwig Maximilian University in Munich.. The approach is similar, the subtitles are burned into the video, but they are not at one specific position, but placed next to the person speaking.

Ada: One potential issue with these approaches is that the text isn't aligned with the pixel grain in the headset, so you get a very messy effect, especially if it's coming from the video feed itself, but maybe you'll come to other approaches later.

Peter: You mean, with text as part of the video, because they share the video resolution, there's no way to render more nicely?

Ada: Yes. There's also an issue with placing the text in the VR scene itself. Because it's not totally aligned correctly because of a lack of reprojection, it's going to be difficult to read. In the future I hope we can solve it using a quad layer, or something.

Peter: I agree. Can I come back to this later? There are some issues in that regard, also due to the fact that 360 video isn't really a VR space, a projection that's not really correct.

<Zakim> adarose, you wanted to comment about the quality of text in burned in video

Peter: (Slide 18) In their study, they compared a mode with the subtitles placed next to the person speaking, so it's more a part of the scene, as opposed to the subtitles fixed to field of view approach.
.... They didn't find that one of these two options was preferred by users. Advantages and disadvantages to both.
.... The main issue they found is that, when placed next to the speaker, you can miss subtitles and people didn't look around that much, but people found it was more comfortable, there was less motion sickness.
.... And there was better presence, a common method to indicate how immersive a scene is. There was also speaker identification, so you can see who is speaking.
.... The disadvantage is that you're forced to look at the speaker if you want to follow the conversation.
.... There are good reasons to follow up on other approaches than the one implemented in most players.
.... (Slide 19) [Demo video showing how it looks with the subtitle at a fixed position in the video]
.... This is from a different implementation of the same behaviour.
.... (Slide 20) Next, I'll give a short summary of our implementation.
.... (Slide 21) Most of the system is based on web technology. A 360 video in h.264, equirectangular format. There are other ways of mapping video to VR space, but this is one of the more common.
.... We deliver the subtitles separately, as IMSC, with some extensions added needed for positional information.
.... Here's an example of a element with longitude and latitude angles that indicate the speaker position.
.... Based on this information, we can add the radar system.
.... (Slide 22) The player implementation is web-based, using on DASH.js and three.js.
.... We added a few personalisation options, like font size, and background, top/bottom position, margin size.
.... It depends on the device you're using. On some devices, the image quality falls off towards the edge of the glasses, so you want the subtitles further to the centre of the screen.
.... You can choose the presentation mode: radar, the arrow, and some other experimental modes.
.... The player renders the subtitles based on the settings,
.... using the IMSC file (not all IMSC features implemented), with our extensions.
.... (Slide 23) There's an MPEG OMAF standard (Omnidirection Media Application Format), one version has been published, and version 2 is being drafted.
.... This allow subtitles to be embedded in a 3D scene. OMAF is about streaming and packaging of 3D / 360 content.
.... You can add subtitles to the scene. They provide a plane for rendering the subtitles.
.... They support two subtitle formats, IMSC and WebVTT. There are two modes: always visible (the typical view, where the subtitles are always on the screen), and fixed position, where it's fixed to the video.

Chris: Thank you for your presentation.

# Discussion

CharlesL: To clarify, for the fixed position, a person who needs this as they're deaf, they can't then look around the screen as they won't see the subtitles. I would want to make sure that this is overrideable, so it's always on screen.
.... Similarly, for a person who's visually impaired, to get the subtitles with a screen reader, having them accessed when the speaker is out of view will be very important, otherwise they won't know where to look.

Peter: I agree.

<Zakim> adarose, you wanted to ask about subtitles depth in 3d video

Ada: What's the depth the subtitles are rendered at in 360 video? If it's infinite, it will end up being a weird hole in the world. If they're rendered at a fixed distance, what happens if the 3D video comes closer to the user than where the subtitles are at?

Peter: We didn't have this problem, as we only did 360 video, not 3D video.
.... It's true, where you have a 3D video, when the subtitles are behind an object, but shown in front of it, it makes you sick, it doesn't work.
.... There must be a way to avoid putting subtitles in front of very close objects.
.... A problem we had in the first implementation, we put the subtitle plane in the middle between the viewer camera and the video plane, and some people had problems viewing.
.... So we moved the subtitle pane as far back as possible, as close as possible to the video sphere.
.... But this wasn't for every person. Some people wearing glasses had problems. It's not that the image wasn't sharp, but they got headaches.

Ada: That is an issue, text in VR is generally considered a bad idea, as reading it is very difficult. There are some things that can make it easier.

Nigel: We have a requirements issue open in TTWG for this.

https://github.com/w3c/tt-reqs/issues/8 TTWG Issue

Nigel: At our meeting on Thursday, the group agreed to take this up for 2019. There's a lot of work still to be done on how to express 3D coordinates in the best way, that works with the web, and to generate tests. We welcome input on that.
.... The group's high level view is that the primary data that needs to be transferred to the player is the 3D location.
.... Then, how the user interface and the options Peter showed are rendered in the player are more of a local setting, player configuration or user setting. We're focus on transferring as little data as possible to the player at this stage.

<Zakim> adarose, you wanted to ask about video and being handled by the UA?

Ada: Is this something you'd like to see the user agent itself handle. There are things the user agent could do that would be beneficial for users. For example, it could save user preferences, e.g., subtitle placement.
.... It could also have more access to the rendering pipeline, to ensure nice rendering, to make the text as legible as possible.
.... Would a subtitle layer, or a video layer that supports subtitles be something you see value in? Or would you always want to implement yourself, to have control over it?

Peter: I see it as a mixture. There's more user control necessary than on a 2D screen, because the viewing environment differs so much.
.... As Nigel said, the core issue is providing subtitle information to the client, to allow rendering of the different variations according to user preferences.
.... There's also a component where the editor needs to be able to prepare subtitle presentations better,
.... especially when you look at the different modes, it depends on the content you have.
.... There were examples in the project where you have one main area of the video where the action takes place.
.... For such content, a lot of people in the project preferred the subtitles to be fixed in location, because it's easier to read and more comfortable to view, and you don't look around anyway.
.... In another example, an interview situation with two people, it wouldn't work as you'd need to look from left to right, which nobody would do.

<Chris_OBrien> Just want to note that speaker identification via colour should be avoided. This contradicts how WCAG instructs avoiding conveying information via colour

Peter: So it depends on the content, an editor would need to provide info on how to render.

Vladimir: A couple of comments. I've been active in VRIF, which for the last year has dedicated a lot of its time to developing guidelines, where text in VR is a significant update.
.... As mentioned, text in VR is generally a bad idea. I would agree with that, with a caveat: text in VR done badly is a bad idea, but when done right it's very beneficial,
.... as the text complements the content, and gives information that would not otherwise be accessible. Doing it right is a complex task.
.... Providing positional information for subtitles is an important component.
.... In many cases, subtitles or other textual elements is also part of the content, so the content author should be able to present it as it's supposed to be, with author control over fonts and positioning,
.... so there's no collisions between the video and text.
.... Collisions are bad, if an object position somehow collides, it interferes with human perception of a scene being real, which leads to the headaches and motion sickness.
.... I think text in VR can be done. It requires certain decisions to be made by authors, some by the player, they need to work together. Something to consider as part of the requirements.
.... I agree with Peter and Ada. Position information in the authored content will be a vital piece of data that players need to ensure the subtitle position doesn't contradict what's in the video, in terms of layout and spatial perception.
.... When that's done right, Ada you mentioned subtitles with perspective transforms applied may be less legible than when presented straight up in front of the viewer.

Ada: I was more saying that it can present issued if done incorrectly. For example, if the text is rendered at the beginning of a frame rather than at the end, you can rely on techniques like reprojection to allow the text to show more crisply, but it's tricky to pull off.

Vladimir: Arguments can be made both ways. Text presented in front of you would be more legible, but text with a perspective transform could be suitable for a particular scene. You need to make sure the VR perception is not broken, and have subtitles rendered as legibly as possible.

<Zakim> kaz, you wanted to check who call-in users are (after the discussion)

# Next steps

Chris: Nigel mentioned that TTWG will work on this in 2019.
.... Is there specific work that needs to go into other groups, e.g., Immersive Web?
.... Is this purely an authoring issue, such that the rendering is handled by an application? Is user agent support needed, e.g, something more integrated into the rendering pipeline as Ada mentioned?

Vladimir: Something should be done between M&E, TTWG and Immersive Web groups.
.... For success of VR in general, and accessibility in particular, I think the subtitles should be authored as part of the content, in harmony with it..

<chaals> [What Vlad said]

Vladimir: Also, authored in a way that users can control, so the groups need to work together on this.

Andreas: We've presented some parts of this at TPAC, and wanted to bring the groups together to see what falls in scope for each.
.... There are four groups that need to be involved. M&E is perfect for getting groups together to discuss scope and requirements, and see where it should go.
.... Some work has already started, as Nigel mentioned. TTWG for the subtitle formats with position information in the IMSC format.
.... The Immersive Web groups, there's a WG and CG, also the accessibility groups such as APA.

<Vlad> Draft VRIF Guidelines (when it comes to Text in VR piece) could be a helpful resource to consider: https://www.vr-if.org/wp-content/uploads/vrif2018.110.04-Guidelines-2.0-for-Community-Review.pdf

Andreas: There are some people couldn't make it today, so we should reserve some time in a future meeting to come up with a more concrete plan to determine which groups would work on specific feature requirements.

Chris: You're right about M&E IG as being a good place to initiate this and do some outreach between different groups. We can help with usecases and requirements, not write specs or API definitions. This would be welcome to do here, if you want to do more analysis and requirements gathering.
.... I have a question about how things work in Immersive Web, should we bring this to the CG?

Peter: This can't be covered by TTWG alone, which only does the format information, the scope here is larger.

Ada: Would be welcome in the CG, that's the best place to start this. There's a proposals repo. I'll share the link.
.... Issues can be discussed there, when they get traction, they can be moved to their own repo.
.... The WG focuses on topics that are ready for people to implement. So this is more a topic for the CG at this point.
.... Chris: Have you already raised this as an issue in the Immersive Web CG?

Andreas: Discussed with Chris Wilson at TPAC, recommended to added an issue in the proposals repo, so it's already there.
.... It would be great for some of the VR experts there to take a look and discuss it.

Ada: I'll ask the chair of the Immersive Web CG to bring it up in an upcoming call.

Andreas: Thank you, we appreciate it.

Ada: There's a call every week, alternating between the CG and the WG.

<adarose> Issues in the IW proposals repo: https://github.com/immersive-web/proposals/issues/40, https://github.com/immersive-web/proposals/issues/39

Charles: I'm also with the APA and the personalisation TF under that.
.... We'd be excited to have a meeting or call with you all. Also reach out to the ARIA WG.

Andreas: The WAI initiative at W3C has interest to bring access services to immersive web environments, subtitles are the tip of the iceberg.
.... I think there's more exchange needed. I propose that some of the people already working on this, Peter, APA, figure out overlaps, and have a separate call, and then follow up in a future M&E IG call.

<mhakkinen> mark hakkinen (mhakkinen) has to leave... also member of APA, our organization (ets.org) has strong interest in accessibility of AR/VR/immersive web in education.

Chris: This sounds like a good plan. The IG would want to help with coordination.
.... We can use some time on our monthly calls to report back.

<Peter__IRT_> Sorry, I can't follow anymore. Audio keeps dropping off.

<tidoust> scribenick: tidoust

Chris: Agree with Andreas idea to propose this as an item for a next Immersive Web CG call. After that, we can schedule with the APA WG. IG members should be free to join any of these calls.
.... From an M&E IG, we can use our monthly call to track progress.

<chaals> [chaals leaves]

<cpn> scribenick: cpn

Andreas: My proposal is to have a dedicated call for the specific people involved in the work.

Ada: My plan to bring this up in the CG. We can follow up in the proposals GH issue.

Andreas: Good to have a link between the Immersitve Accessibility project and APA. I have a task to coordinate this in TTWG, so I could try to reach out to schedule a meeting, I will take the lead on that.

Peter: Would be great if we can have another smaller meeting, can invite other partners in the project, e.g., the player implementers.

Pierre: I think it would be good if the TTWG can provide a timeline for its work, so other groups can sync and know when input is needed.
.... I suggest the IG chair sends an email to the TTWG chair to ask for this.. What's the timeline? When do you expect requirements?

Chris: I'm more than happy to do that.

Pierre: Good to have it in writing, for those not on the call.

Chris: Thank you Peter for presenting today. It's clearly a topic of great interest, so this is very valuable. Look forward to continuing the collaboration.

<kaz> scribenick: kaz

Perspective on M&E for the Web
Chris: At the previous call, Francois presented the Perspective on M&E for the Web document. It's intended for the IG to reflect on industry direction and requirements for new web technologies.
.... Do you have any news?

Francois: I don't have much news, I have transferred the repo to the W3C GitHub following the CfC.
.... I updated the GH issue accordingly.

<tidoust> https://github.com/w3c/media-and-entertainment/issues/12#issuecomment-460691081 Update on perspecitve document

Francois: That's I have for now, the document hasn't changed, other than it now being and Editor's Draft in the IG.
.... My goal is to organize discussion to publish a FPWD.
.... I'm going to create some issues and gather feedback from the IG,
.... and update the document so it reflects your vision of M&E.

Chris: We can come back to this on a next call, for the IG to take a look to give input.
.... Thank you, it's a good next step.
.... I would recommend IG members to take a look, and see our future directions.
.... Your input to that is very welcome.

# Media Timed Events TF

Chris: This is making steady, but not fast progress to completing the use case and requirements doc.
.... Some editorial work I need to do. We're looking at emsg support on the Web, but also some synchronized rendering and timing issues.
.... I hope to have the document updated based on the feedback from Mark within the next week or so.
.... Then we're ready to publish the document and open up the discussion with browser vendors.
.... The work then moves to spec development.

# IG rechartering

Chris: Our current charter period ends at end of April.
.... There is a draft Charter for the discussion. We're using the IG GitHub to draft.

<cpn> https://github.com/w3c/media-and-entertainment/blob/master/charters/charter-2019.html Draft Charter

Chris: IG participants can see what to do in the next charter period
.... We'll discuss among the co-chairs,
.... but we also want to hear from IG members, things going well or not,
.... something you'd like to change?
.... You're welcome to review it and give comments.

Mark: There was discussion at TPAC about a possible Media WG.
.... This would probably the main focus on what the MEIG is doing.
.... It hasn't been chartered yet. We'll want to orient the IG towards that group.
.... This would be the key group we interact with, and we won't be able to put it in the charter right now.
.... Ideally, the IG charter would follow that being worked out. Should we postpone?

Chris: The charter lists the groups we liaise with, so it's important to have it there. I think we should continue to have charter cover while that's worked out.

Francois: I would proceed, as Media WG is not established yet.
.... When plans become clearer, it's possible to extend the IG charter by a few months.
.... I would proceed in parallel.
.... would be better to synchronized with each other

Pierre: Thanks for bringing this up, Mark.
.... Is somebody objecting to create a Media WG, or do we lack critical mass?

<cpn> scribenick: cpn

Francois: There is a critical mass, so no problem there. We're having internal discussions on EME, before going to the AC.
.... As soon as we have an agreement on the direction, we'll send an advance notice to the AC.

<kaz> scribenick: kaz

[adjourned]

Summary of Action Items
Summary of Resolutions

[End of minutes]
Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2019/02/07 15:36:16 $

-----------------------------
http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and
may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in
error, please delete it from your system.
Do not use, copy or disclose the
information in any way nor act in reliance on it and notify the sender
immediately.
Please note that the BBC monitors e-mails
sent or received.
Further communication will signify your consent to
this.
-----------------------------

Received on Friday, 8 February 2019 17:19:24 UTC