- From: Kazuyuki Ashimura <ashimura@w3.org>
- Date: Fri, 16 Mar 2018 04:08:42 +0900
- To: public-web-and-tv@w3.org
available at:
https://www.w3.org/2018/03/06-me-minutes.html
also as text below.
Thanks, a lot for taking these minutes, Chris!
Kazuyuki
---
[1]W3C
[1] http://www.w3.org/
- DRAFT -
Media and Entertainment IG
06 Mar 2018
[2]Agenda
[2] https://lists.w3.org/Archives/Member/member-web-and-tv/2018Feb/0004.html
Attendees
Present
Kaz_Ashimura, Bob_Lund, Chris_Needham, Cyril_Concolato,
Giri_Mandayam, Francois_Daoust, Geun-Hyung_Kim,
Eric_Carlson, Tatsuya_Igarashi, Nigel_Megitt,
Peter_tho_Pesch, Steve_Morris, Marisa_DeMeglio,
John_Luther, Kazuhiro_Hoya
Regrets
Chair
Chris, Igarashi
Scribe
Chris, Kaz
Contents
* [3]Topics
1. [4]Introduction
2. [5]Carriage of Web Resources in ISO-BMFF
3. [6]E-Publishing on the Web
4. [7]Support for caption formats other than WebVTT
5. [8]Next steps
6. [9]Conclusion
7. [10]Next IG meeting
* [11]Summary of Action Items
* [12]Summary of Resolutions
__________________________________________________________
Introduction
<kaz> scribenick: kaz
Chris: During the previous call, Giri gave a presentation on
media timed events
... ATSC work, DASH events, emsg in ISO BMFF containers, ...
... which identified potential gaps with web platform
... That call was well attended, the topic seemed of interest
to many IG members
... so I thought that it was something that the IG should
follow up on
... As part of that, I produced an initial document to
summarize what we discussed
... pointing to existing work, and previous discussions
<tidoust> [13]Use cases and gap analysis: Media timed events
and synchronisation in HTML5
[13] https://github.com/w3c/media-and-entertainment/blob/master/media-timed-events/use-cases-and-gap-analysis.md
Chris: I would like to figure out what the IG should usefully
do
... so today I'm hoping for an open discussion amongst us all,
... to think about our next steps to progress on this topic
... The document talked described three use cases:
... Synchronised event triggering, support for subtitle and
caption formats other than WebVTT, and Synchronised rendering
of web resources
... I would like to invite Cyril to tell us about synchronised
rendering of web resources
... I have invited Marisa to join us, as chair of the
Synchronised Multimedia for Publications CG
... [14]https://www.w3.org/community/sync-media-pub/
... Maybe you could tell us what some of your goals are, and
the current status?
... On the timed text side, it's great to have members of TTWG
with us today, thank you
... I've spoken with Andreas about the generic TextTrackCue
proposoal, he can't be here today so I'll talk about that later
... I also want to ask Giri to talk about our next steps
... AOB?
[14] https://www.w3.org/community/sync-media-pub/
Nigel: I sent a message to the IG recently about audio
description
... implementation of client side, requirements for capture
Chris: Yes, let's cover that as well, thank you.
Carriage of Web Resources in ISO-BMFF
<scribe> scribenick: cpn
Cyril: Here's a document i'm editing at MPEG: Carriage of Web
resources in ISO BMFF
... [shares his screen]
... It started as an activity in MPEG a while ago, exploring
what was needed in the MPEG space,
... to facilitate delivery of web resources: HTML, JavaScript,
CSS, etc
... We weren't sure at the beginning what the output would be
in terms of standards
... We've produced a committee draft, not uploaded yet, I will
do that in a few days
... It's quite a light document, it doesn't define a new
toolbox
... It's similar to CMAF in that sense, it describes how you
use existing tools from ISO BMFF
... The two aspects we're dealing with are: carriage of timed
web resources, and carriage of non-timed resources
... The difference is more in how the timing information is
delivered,
... eg a resource where the timing is defined in an XML
document
... What is a timed web resource? They're stored in tracks, one
type carries HTML content, another type with JS, another with
WebVTT metadata events
... In the HTML track, the idea is not to define a mechanism or
complex processing for HTML data. The document is loaded at the
time by the processing
... It's as if the browser navigates from one document to
another at the given time
... For JavaScript code, this could have no HTML at all, if the
entire timed application is in JavaScript
... A note about emsg boxes: It's important to understand the
difference between this, and the draft doc I'm presenting here
... The tracks here are first class tracks in MP4, meant to be
processed in a timely manner.
... With emsg boxes, they're more targeted to the application,
not meant to be replayed
... The content of the time track in this case would be
replayed
... We need to be precise about what entity in the consumer is
intended to handle these events,
... is it something deep in the media player, or something in
the application layer?
Igarashi: I see the difference between the timed media track
and emsg boxes, but i don't see the use cases for timed web
resources
Cyril: I agree, in most cases you won't have continuous HTML
changes
... The track mechanism can handle sparse events
... The question is which entity will consume the events, and
what's the processing
... One thing not clear to me with emsg, is what happens when
you defragment the file?
... The emsg box in my view is something that you consume while
streaming, but has no meaning outside this
... With timed tracks, content is expected to be useful
separately
Bob: This distinction, is this something that should be fixed
in the emsg spec?
... I can see applications where you want to replay emsg events
Cyril: Maybe it is possible to design such a player
Bob: We extended the dash player to handle emsg events and dash
events
Cyril: In section 5.4, the use of URLs to web resources, the
idea is to clarify how to link to such resources
... The meta box contains data that should be seen by the
browser as a local cache
... If the browser loads the content, and needs some CSS, it
can find it in the cache, otherwise it goes to the network
... This isn't a new idea, just highlighted in this document
<Zakim> nigel, you wanted to ask how WebVTT metadata can be
made available to JavaScript code in the absence of DataCue
implementations
Nigel: there's a suggestion that the data gets turned into
something consumable from JS
... This implies DataCue, or is there another way to do it?
Cyril: This doc only covers storage, not how it's exposed,
DataCue is one way to go
Nigel: Other mechanisms? Is it important to MPEG how
implementable this is (more a process question)?
Cyril: MPEG started this as there was evidence that with this,
you could do something in the browser,
... eg, a service worker consuming an MP4 file is another way
<kaz> Chris: Thanks Cyril for presenting this information, this
is really valuable input.
Igarashi: Regarding web resources, via emsg or tracks, who
consumes the resources is independent of the delivery
<cyril> RRSAgent: pointer
Igarashi: Also, emsg could be used for replay as well as web
resource tracks, and not just in the streaming case
Cyril: I'd like to clarify the terms we're using. We should be
clear what is an event and what is a resource
... For me, an event is something that causes a trigger,
shouldn't necessarily carry the resource
Igarashi: emsg could be arbitrary binary messsages
E-Publishing on the Web
Marisa: I work for the DAISY consortium,
... on talking books for the blind and visually impaired
... We work with EPUB, audio clips synchronised with fragments
in an HTML5
... We want this in the next iteration of EPUB on the web, we
spun out a CG from the Publishing WG
... The task for our CG is to look at existing technology,
ideally don't reinvent anything
... What we need is the ability to synchronise audio fragments
with HTML fragments
... For example, the page of a book is open, the user presses
Play, and depending on implementation / user preference
... there's a highlight that follows the phrases
... I heard that datacue could be useful for us, and I want to
learn about this group, and TTML
<Zakim> nigel, you wanted to ask if the audio is pre-recorded
or synthesised
Nigel: Is the audio pre-recorded, or is it synthesised based on
text?
Marisa: It's pre-recorded
Nigel: So there's not the need for a screen reader
... TTML and WebVTT are predicated on playing back timed media,
but in your case it seems the events are user driven
... Seems there isn't a good fit with TTML / WebVTT, a better
fit could be SMIL
Marisa: SMIL is a good fit, but nobody enjoys writing it, or
reading it
... We're looking to move to something simpler to ingest, and
also for people to comprehend
... The SMIL files that our producers make are driven by time
codes, but the user can start playback and interrupt it,
... but once playback starts, it plays from top to bottom
Nigel: TTML2 has hooks in it for playing audio files at
specific times
... My understanding that you'd need custom data in a WebVTT
payload to achieve the same thing
Marisa: I've been looking for examples, but found nothing
similar. In my case, the TTML wouldn't have text, only audio
Nigel: That's possible with TTML, either embedded fragments or
references to external resources
Marisa: Is there a specific profile?
Nigel: I've invited people to participate, maybe as a W3C CG,
to create a TTML profile for audio requirements
Marisa: How are browsers with TTML2? This is our primary user
agent base
Nigel: Browsers don't generally support it, in the main, it can
be done in JavaScript
chris: Anything else to mention on the possible CG, Nigel?
Nigel: Only that synchronised playback will have requirements
for playback of media timed events
... In terms of solutions, we might want to look at what Web
Audio does
... This has advanced instructions to the processor of what
needs to happen and when
... It's a different model to TextTrackCue, instructive to see
that that exists. Is it useful to extend that model into other
domain?
<Zakim> ericc, you wanted to suggest that a simple "data cue"
may be exactly what is needed
Eric: I'd like to suggest that DAISY's needs could be met by a
simple DataCue,
... a timed event emitted based on current time of the media
file (the spoken audio in this case).
... it contains a blob of data to be interpreted by script
rather than the UA.
... When a section of the audio is emitted by the UA, it also
emits the DataCue.
... On user interaction with the page, the script would get
information from the markup about the time corresponding to
that phrase
... The script wouldn't have to be terribly sophisticated, and
should work for what you're trying to do
Marisa: That's how it works now, though we want to give it a
refresh, move away from SMIL, maybe to something that could be
implemented natively by browsers
... Is what you described possible today?
Eric: it is possible in safari. it has an implementation of
DataCue, was in the spec several years ago
... it's been removed from the spec, but people are talking
about reviving it
... it could be implemented in safari right now
<Zakim> kaz, you wanted to ask about the usage of SSML
Kaz: SSML and the speech API may be of interest too
... You mentioned using pre-recorded audio, if we use speech
synthesis we could generate the audio based on SSML
Marisa: What we see with content without pre-recorded audio,
people use prefer to use screen readers
... We still need pre-recorded audio for professional
productions, and systems without text-to-speech
<Zakim> nigel, you wanted to note that web speech api's output
is not available to Web Audio, which is a technical limitation
for implementers
Nigel: The Web Speech API makes the operating system generate
the speech output, but this audio isn't available to Web Audio
API
... This is a gap that we found
... Also, regarding screen readers, what's the size of the
community of people who want synthesized speech, but don't have
screen readers?
Marisa: That's a good question, let me find out about that
Support for caption formats other than WebVTT
chris: I spoke to Andreas offline. He has hosted discussions at
TPACs previously on the need for a generic TextTrackCue API
... I have invited him to give us an update on this when he's
ready
Next steps
<kaz> scribenick: kaz
Chris: After the last call, we thought about what to do as next
steps within this IG
<cpn> scribenick: cpn
Giri: We talked about making a Task Force, to gather use cases
and requirements
... This sounds useful, given the discussion we've had today
... My proposal is to take this into solid proposals for web
standardisation
... This could be bringing new requirements to an existing
spec, eg ISO BMFF container handling
... A Task Force with limited life span, to conclude at TPAC
this year
... We can have monthly calls, can do on it on GitHub or wiki,
seems more collaborative on GitHub
... We want to consider not just the streaming media use cases,
but also the EPUB use cases,
... and other areas where timed metadata is useful, to cover
all our interests
... Will talk with W3C staff about setting up a GitHub
<kaz> scribenick: kaz
Chris: I agree about GitHub, possibly the output could be an
W3C IG Note, we'll see
Giri: Would like to do that after the GitHub repo is set up
Chris: We should talk about some of the details offline, for
example,
... should we have separate calls for the TF?
... There are other topics that the IG could discuss, so maybe
having separate calls for the TF could be a way to go
... We'll discuss and announce something to the IG
Conclusion
Chris: This is really interesting area, thank you all for your
contributions
... We've heard different views around a common area of
interest
... The detail of the TF is to be announced
Kaz: Should we record the decision to create the TF as
RESOLUTION?
Chris: Yes
RESOLUTION: We'll create a dedicated TF for the Media-Timed
Events topic (detail to be announced)
Next IG meeting
-> [15]W3C Comm Team's message on Daylight Savings
(member-only)
[15] https://lists.w3.org/Archives/Member/chairs/2018JanMar/0087.html
Chris: April 3
... but please note there is daylight saving switch over
... thank you for joining, everybody
... speak to you in one month!
[adjourned]
Summary of Action Items
Summary of Resolutions
1. [16]We'll create a dedicated TF for the Media-Timed Events
topic (detail to be announced)
[End of minutes]
__________________________________________________________
Minutes formatted by David Booth's [17]scribe.perl version
1.147 ([18]CVS log)
$Date: 2018/03/15 19:04:58 $
[17] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[18] http://dev.w3.org/cvsweb/2002/scribe/
Received on Thursday, 15 March 2018 19:09:55 UTC