[me] minutes - 6 March 2018

available at:
  https://www.w3.org/2018/03/06-me-minutes.html

also as text below.

Thanks, a lot for taking these minutes, Chris!

Kazuyuki

---

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

                       Media and Entertainment IG
                              06 Mar 2018

   [2]Agenda

      [2] https://lists.w3.org/Archives/Member/member-web-and-tv/2018Feb/0004.html

Attendees

   Present
          Kaz_Ashimura, Bob_Lund, Chris_Needham, Cyril_Concolato,
          Giri_Mandayam, Francois_Daoust, Geun-Hyung_Kim,
          Eric_Carlson, Tatsuya_Igarashi, Nigel_Megitt,
          Peter_tho_Pesch, Steve_Morris, Marisa_DeMeglio,
          John_Luther, Kazuhiro_Hoya

   Regrets

   Chair
          Chris, Igarashi

   Scribe
          Chris, Kaz

Contents

     * [3]Topics
         1. [4]Introduction
         2. [5]Carriage of Web Resources in ISO-BMFF
         3. [6]E-Publishing on the Web
         4. [7]Support for caption formats other than WebVTT
         5. [8]Next steps
         6. [9]Conclusion
         7. [10]Next IG meeting
     * [11]Summary of Action Items
     * [12]Summary of Resolutions
     __________________________________________________________

Introduction

   <kaz> scribenick: kaz

   Chris: During the previous call, Giri gave a presentation on
   media timed events
   ... ATSC work, DASH events, emsg in ISO BMFF containers, ...
   ... which identified potential gaps with web platform
   ... That call was well attended, the topic seemed of interest
   to many IG members
   ... so I thought that it was something that the IG should
   follow up on
   ... As part of that, I produced an initial document to
   summarize what we discussed
   ... pointing to existing work, and previous discussions

   <tidoust> [13]Use cases and gap analysis: Media timed events
   and synchronisation in HTML5

     [13] https://github.com/w3c/media-and-entertainment/blob/master/media-timed-events/use-cases-and-gap-analysis.md

   Chris: I would like to figure out what the IG should usefully
   do
   ... so today I'm hoping for an open discussion amongst us all,
   ... to think about our next steps to progress on this topic
   ... The document talked described three use cases:
   ... Synchronised event triggering, support for subtitle and
   caption formats other than WebVTT, and Synchronised rendering
   of web resources
   ... I would like to invite Cyril to tell us about synchronised
   rendering of web resources
   ... I have invited Marisa to join us, as chair of the
   Synchronised Multimedia for Publications CG
   ... [14]https://www.w3.org/community/sync-media-pub/
   ... Maybe you could tell us what some of your goals are, and
   the current status?
   ... On the timed text side, it's great to have members of TTWG
   with us today, thank you
   ... I've spoken with Andreas about the generic TextTrackCue
   proposoal, he can't be here today so I'll talk about that later
   ... I also want to ask Giri to talk about our next steps
   ... AOB?

     [14] https://www.w3.org/community/sync-media-pub/

   Nigel: I sent a message to the IG recently about audio
   description
   ... implementation of client side, requirements for capture

   Chris: Yes, let's cover that as well, thank you.

Carriage of Web Resources in ISO-BMFF

   <scribe> scribenick: cpn

   Cyril: Here's a document i'm editing at MPEG: Carriage of Web
   resources in ISO BMFF
   ... [shares his screen]
   ... It started as an activity in MPEG a while ago, exploring
   what was needed in the MPEG space,
   ... to facilitate delivery of web resources: HTML, JavaScript,
   CSS, etc
   ... We weren't sure at the beginning what the output would be
   in terms of standards
   ... We've produced a committee draft, not uploaded yet, I will
   do that in a few days
   ... It's quite a light document, it doesn't define a new
   toolbox
   ... It's similar to CMAF in that sense, it describes how you
   use existing tools from ISO BMFF
   ... The two aspects we're dealing with are: carriage of timed
   web resources, and carriage of non-timed resources
   ... The difference is more in how the timing information is
   delivered,
   ... eg a resource where the timing is defined in an XML
   document
   ... What is a timed web resource? They're stored in tracks, one
   type carries HTML content, another type with JS, another with
   WebVTT metadata events
   ... In the HTML track, the idea is not to define a mechanism or
   complex processing for HTML data. The document is loaded at the
   time by the processing
   ... It's as if the browser navigates from one document to
   another at the given time
   ... For JavaScript code, this could have no HTML at all, if the
   entire timed application is in JavaScript
   ... A note about emsg boxes: It's important to understand the
   difference between this, and the draft doc I'm presenting here
   ... The tracks here are first class tracks in MP4, meant to be
   processed in a timely manner.
   ... With emsg boxes, they're more targeted to the application,
   not meant to be replayed
   ... The content of the time track in this case would be
   replayed
   ... We need to be precise about what entity in the consumer is
   intended to handle these events,
   ... is it something deep in the media player, or something in
   the application layer?

   Igarashi: I see the difference between the timed media track
   and emsg boxes, but i don't see the use cases for timed web
   resources

   Cyril: I agree, in most cases you won't have continuous HTML
   changes
   ... The track mechanism can handle sparse events
   ... The question is which entity will consume the events, and
   what's the processing
   ... One thing not clear to me with emsg, is what happens when
   you defragment the file?
   ... The emsg box in my view is something that you consume while
   streaming, but has no meaning outside this
   ... With timed tracks, content is expected to be useful
   separately

   Bob: This distinction, is this something that should be fixed
   in the emsg spec?
   ... I can see applications where you want to replay emsg events

   Cyril: Maybe it is possible to design such a player

   Bob: We extended the dash player to handle emsg events and dash
   events

   Cyril: In section 5.4, the use of URLs to web resources, the
   idea is to clarify how to link to such resources
   ... The meta box contains data that should be seen by the
   browser as a local cache
   ... If the browser loads the content, and needs some CSS, it
   can find it in the cache, otherwise it goes to the network
   ... This isn't a new idea, just highlighted in this document

   <Zakim> nigel, you wanted to ask how WebVTT metadata can be
   made available to JavaScript code in the absence of DataCue
   implementations

   Nigel: there's a suggestion that the data gets turned into
   something consumable from JS
   ... This implies DataCue, or is there another way to do it?

   Cyril: This doc only covers storage, not how it's exposed,
   DataCue is one way to go

   Nigel: Other mechanisms? Is it important to MPEG how
   implementable this is (more a process question)?

   Cyril: MPEG started this as there was evidence that with this,
   you could do something in the browser,
   ... eg, a service worker consuming an MP4 file is another way

   <kaz> Chris: Thanks Cyril for presenting this information, this
   is really valuable input.

   Igarashi: Regarding web resources, via emsg or tracks, who
   consumes the resources is independent of the delivery

   <cyril> RRSAgent: pointer

   Igarashi: Also, emsg could be used for replay as well as web
   resource tracks, and not just in the streaming case

   Cyril: I'd like to clarify the terms we're using. We should be
   clear what is an event and what is a resource
   ... For me, an event is something that causes a trigger,
   shouldn't necessarily carry the resource

   Igarashi: emsg could be arbitrary binary messsages

E-Publishing on the Web

   Marisa: I work for the DAISY consortium,
   ... on talking books for the blind and visually impaired
   ... We work with EPUB, audio clips synchronised with fragments
   in an HTML5
   ... We want this in the next iteration of EPUB on the web, we
   spun out a CG from the Publishing WG
   ... The task for our CG is to look at existing technology,
   ideally don't reinvent anything
   ... What we need is the ability to synchronise audio fragments
   with HTML fragments
   ... For example, the page of a book is open, the user presses
   Play, and depending on implementation / user preference
   ... there's a highlight that follows the phrases
   ... I heard that datacue could be useful for us, and I want to
   learn about this group, and TTML

   <Zakim> nigel, you wanted to ask if the audio is pre-recorded
   or synthesised

   Nigel: Is the audio pre-recorded, or is it synthesised based on
   text?

   Marisa: It's pre-recorded

   Nigel: So there's not the need for a screen reader
   ... TTML and WebVTT are predicated on playing back timed media,
   but in your case it seems the events are user driven
   ... Seems there isn't a good fit with TTML / WebVTT, a better
   fit could be SMIL

   Marisa: SMIL is a good fit, but nobody enjoys writing it, or
   reading it
   ... We're looking to move to something simpler to ingest, and
   also for people to comprehend
   ... The SMIL files that our producers make are driven by time
   codes, but the user can start playback and interrupt it,
   ... but once playback starts, it plays from top to bottom

   Nigel: TTML2 has hooks in it for playing audio files at
   specific times
   ... My understanding that you'd need custom data in a WebVTT
   payload to achieve the same thing

   Marisa: I've been looking for examples, but found nothing
   similar. In my case, the TTML wouldn't have text, only audio

   Nigel: That's possible with TTML, either embedded fragments or
   references to external resources

   Marisa: Is there a specific profile?

   Nigel: I've invited people to participate, maybe as a W3C CG,
   to create a TTML profile for audio requirements

   Marisa: How are browsers with TTML2? This is our primary user
   agent base

   Nigel: Browsers don't generally support it, in the main, it can
   be done in JavaScript

   chris: Anything else to mention on the possible CG, Nigel?

   Nigel: Only that synchronised playback will have requirements
   for playback of media timed events
   ... In terms of solutions, we might want to look at what Web
   Audio does
   ... This has advanced instructions to the processor of what
   needs to happen and when
   ... It's a different model to TextTrackCue, instructive to see
   that that exists. Is it useful to extend that model into other
   domain?

   <Zakim> ericc, you wanted to suggest that a simple "data cue"
   may be exactly what is needed

   Eric: I'd like to suggest that DAISY's needs could be met by a
   simple DataCue,
   ... a timed event emitted based on current time of the media
   file (the spoken audio in this case).
   ... it contains a blob of data to be interpreted by script
   rather than the UA.
   ... When a section of the audio is emitted by the UA, it also
   emits the DataCue.
   ... On user interaction with the page, the script would get
   information from the markup about the time corresponding to
   that phrase
   ... The script wouldn't have to be terribly sophisticated, and
   should work for what you're trying to do

   Marisa: That's how it works now, though we want to give it a
   refresh, move away from SMIL, maybe to something that could be
   implemented natively by browsers
   ... Is what you described possible today?

   Eric: it is possible in safari. it has an implementation of
   DataCue, was in the spec several years ago
   ... it's been removed from the spec, but people are talking
   about reviving it
   ... it could be implemented in safari right now

   <Zakim> kaz, you wanted to ask about the usage of SSML

   Kaz: SSML and the speech API may be of interest too
   ... You mentioned using pre-recorded audio, if we use speech
   synthesis we could generate the audio based on SSML

   Marisa: What we see with content without pre-recorded audio,
   people use prefer to use screen readers
   ... We still need pre-recorded audio for professional
   productions, and systems without text-to-speech

   <Zakim> nigel, you wanted to note that web speech api's output
   is not available to Web Audio, which is a technical limitation
   for implementers

   Nigel: The Web Speech API makes the operating system generate
   the speech output, but this audio isn't available to Web Audio
   API
   ... This is a gap that we found
   ... Also, regarding screen readers, what's the size of the
   community of people who want synthesized speech, but don't have
   screen readers?

   Marisa: That's a good question, let me find out about that

Support for caption formats other than WebVTT

   chris: I spoke to Andreas offline. He has hosted discussions at
   TPACs previously on the need for a generic TextTrackCue API
   ... I have invited him to give us an update on this when he's
   ready

Next steps

   <kaz> scribenick: kaz

   Chris: After the last call, we thought about what to do as next
   steps within this IG

   <cpn> scribenick: cpn

   Giri: We talked about making a Task Force, to gather use cases
   and requirements
   ... This sounds useful, given the discussion we've had today
   ... My proposal is to take this into solid proposals for web
   standardisation
   ... This could be bringing new requirements to an existing
   spec, eg ISO BMFF container handling
   ... A Task Force with limited life span, to conclude at TPAC
   this year
   ... We can have monthly calls, can do on it on GitHub or wiki,
   seems more collaborative on GitHub
   ... We want to consider not just the streaming media use cases,
   but also the EPUB use cases,
   ... and other areas where timed metadata is useful, to cover
   all our interests
   ... Will talk with W3C staff about setting up a GitHub

   <kaz> scribenick: kaz

   Chris: I agree about GitHub, possibly the output could be an
   W3C IG Note, we'll see

   Giri: Would like to do that after the GitHub repo is set up

   Chris: We should talk about some of the details offline, for
   example,
   ... should we have separate calls for the TF?
   ... There are other topics that the IG could discuss, so maybe
   having separate calls for the TF could be a way to go
   ... We'll discuss and announce something to the IG

Conclusion

   Chris: This is really interesting area, thank you all for your
   contributions
   ... We've heard different views around a common area of
   interest
   ... The detail of the TF is to be announced

   Kaz: Should we record the decision to create the TF as
   RESOLUTION?

   Chris: Yes

   RESOLUTION: We'll create a dedicated TF for the Media-Timed
   Events topic (detail to be announced)

Next IG meeting

   -> [15]W3C Comm Team's message on Daylight Savings
   (member-only)

     [15] https://lists.w3.org/Archives/Member/chairs/2018JanMar/0087.html

   Chris: April 3
   ... but please note there is daylight saving switch over
   ... thank you for joining, everybody
   ... speak to you in one month!

   [adjourned]

Summary of Action Items

Summary of Resolutions

    1. [16]We'll create a dedicated TF for the Media-Timed Events
       topic (detail to be announced)

   [End of minutes]
     __________________________________________________________


    Minutes formatted by David Booth's [17]scribe.perl version
    1.147 ([18]CVS log)
    $Date: 2018/03/15 19:04:58 $

     [17] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
     [18] http://dev.w3.org/cvsweb/2002/scribe/

Received on Thursday, 15 March 2018 19:09:55 UTC