Re: Synchronized narration work from the Sync Media Pub CG from Nigel Megitt on 2019-06-12 (public-audio-description@w3.org from June 2019)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Wed, 12 Jun 2019 13:48:48 +0000
To: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
CC: "Charles 'chaals' (McCathie) Nevile" <chaals@yandex.ru>, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>, "public-audio-description@w3.org" <public-audio-description@w3.org>, "Marisa DeMeglio" <marisa.demeglio@gmail.com>, Daniel Weck <daniel.weck@gmail.com>
Message-ID: <D926BFC0.46107%nigel.megitt@bbc.co.uk>
Hi Ingar,

Thanks for the suggestion. Right now, it doesn’t fit my world view: whereas sequencers and timing objects seem well suited to a linear progression, the publishing, content-based domain does not.

My example use case here is that one about paged presentation. At authoring time you don’t necessarily know where the page boundaries will fall. But you might want to trigger an event (the same one every time) based on traversal of a page boundary. That doesn’t look like returning to the same point on the sequence/time line each time, it looks like inserting events on the sequence based on the content and the rendering system, and potentially changing those every time the layout changes.

Also, the progression through the media isn’t at a constant rate.

To my mind (which always likes to think it is open to changing!) the mismatch between the timing/sequence model and the content model is just too big to justify coercing them to be the same. It could be that there’s a useful abstraction that can be specialised to either one case or the other. I just haven’t seen it yet.

Nigel

From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com<mailto:ingar.arntzen@gmail.com>>
Date: Wednesday, 12 June 2019 at 14:06
To: Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>
Cc: "Charles 'chaals' (McCathie) Nevile" <chaals@yandex.ru<mailto:chaals@yandex.ru>>, "public-web-and-tv@w3.org<mailto:public-web-and-tv@w3.org>" <public-web-and-tv@w3.org<mailto:public-web-and-tv@w3.org>>, "public-audio-description@w3.org<mailto:public-audio-description@w3.org>" <public-audio-description@w3.org<mailto:public-audio-description@w3.org>>, Marisa DeMeglio <marisa.demeglio@gmail.com<mailto:marisa.demeglio@gmail.com>>, Daniel Weck <daniel.weck@gmail.com<mailto:daniel.weck@gmail.com>>
Subject: Re: Synchronized narration work from the Sync Media Pub CG

Hi Nigel.

If I understand you correctly, the two modes you are referring to are playback of items in time-domain and playback in some other domain, e.g. content-ordering domain.

Using timing objects and sequencers it is pretty straightforward to support both modes (as well as shifting dynamically between them). For instance, we use independent timing objects for the different domains where we need playback, and then we register the data with sequencers driven by the different timingobjects. UI can then dispatch interaction events to the correct playback controls (i.e.. timing object(s)).

I don't think there are any particular requirements on the data representation to make this work. If you can deduce how a particular data item relates to a given axis (time, ordering or whatever), then you register sequencer cues for the item, an have enter/exit events delivered at the correct time.

In many applications it would then be a matter of opinion which counts as "primary" content. For instance, with a slide show presentation synced to a video (of the slide-show presenter) you could navigate the session by video controls whenever that makes sense, or slide-show controls in the interactive slide viewer (on a different device). One would also want to support dynamic switching as you leave a "primary-mode-reading" and enter "primary-mode-listen", perhaps because your circumstances change mid-session.

I guess a key source for this flexibility in the timing object model is the insight that no media should be "primary" in a technical sense, implying that media control must be separate from media.

Ingar Arntzen
Multi-device Timing Community Group








ons. 12. jun. 2019 kl. 13:59 skrev Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>:
Hi,

This is interesting work, and I’m particularly interested in the context of work we’re doing in the Audio Description Community Group, which is related but different.

Perhaps in answer to Chaals’s question, there’s a technical difference here that feels like we ought to be able to work around, but is actually quite fundamental to the way that WebVTT and TTML work, for example, but not this use case.

Whereas timed text formats are predicated on media times, the requirement for this application, if I’ve understood it correctly, is to link the audio rendition to element content. This is why it works that you can click on some text and hear the audio for that directly. The text isn’t acting as a play-head-position link, i.e. “on click, move play head to 13.2s”, but there is a mode of operation where you hear the consecutive elements’ linked audio being played consecutively, as if it is continuous media, with a link back to the highlighting of each separate snippet of text.

I’ve seen plenty of examples of timed transcripts of videos where the text at the play head time is highlighted and the user can click on any random place in the transcript to make the play head jump there, but I think there’s a semantic mismatch between that experience and this one – here the text content is the primary thing, not the video/audio.

I recall a use case being discussed previously as well where, on a paged view of the text, a sound effect can be played each time there’s a page turn. This approach is amenable to that, assuming that the same fragment id can be reused on different elements in the source document.

I’d love to see a single approach that could make both use cases work, but I’m not sure what it would look like. SMIL probably got somewhere very close, using the event based model, where events could be generated by time on a media playback timeline or via some other API that fires them, but I sense that the willingness to implement all of the complexity involved in SMIL may be dwindling, and in any case, it doesn’t by itself resolve the problem of how to express timing and element content event triggering in a timed text document format.

Kind regards,

Nigel


From: "Charles 'chaals' (McCathie) Nevile" <chaals@yandex.ru<mailto:chaals@yandex.ru>>
Date: Wednesday, 12 June 2019 at 09:58
To: "public-web-and-tv@w3.org<mailto:public-web-and-tv@w3.org>" <public-web-and-tv@w3.org<mailto:public-web-and-tv@w3.org>>, "public-audio-description@w3.org<mailto:public-audio-description@w3.org>" <public-audio-description@w3.org<mailto:public-audio-description@w3.org>>, Marisa DeMeglio <marisa.demeglio@gmail.com<mailto:marisa.demeglio@gmail.com>>
Cc: Daniel Weck <daniel.weck@gmail.com<mailto:daniel.weck@gmail.com>>
Subject: Re: Synchronized narration work from the Sync Media Pub CG
Resent-From: <public-audio-description@w3.org<mailto:public-audio-description@w3.org>>
Resent-Date: Wednesday, 12 June 2019 at 09:59

Hi,

Thanks for the pointer.

I'm curious why you wouldn't use e.g. WebVTT or another existing markup that carries the associations between text and audio renderings.

cheers

Chaals

On Wed, 12 Jun 2019 02:12:26 +0200, Marisa DeMeglio <marisa.demeglio@gmail.com<mailto:marisa.demeglio@gmail.com>> wrote:

Hi all,

Chris Needham suggested that I share with you what we’ve been working on in the Synchronized Media for Publications CG - it’s a lightweight JSON format for representing pre-recorded narration synchronized with HTML, to provide an accessible reading experience. The primary use case is in web publications (we are involved in the Publishing WG), but it has been designed to live as a standalone “overlay” for HTML documents. Below are some links to the latest drafts.

And, just to give you an idea of the basic user experience, here are two slightly different proof of concept demos for the sample in our repository:

- https://raw.githack.com/w3c/sync-media-pub/master/samples/single-document/index.html
- https://raw.githack.com/w3c/sync-media-pub/feature/custom-read-aloud-player/samples/single-document/index.html

Interested in hearing your thoughts!

Marisa DeMeglio
DAISY Consortium


Begin forwarded message:

From: Marisa DeMeglio <marisa.demeglio@gmail.com<mailto:marisa.demeglio@gmail.com>>
Subject: Drafts and a sample
Date: June 8, 2019 at 6:58:13 PM PDT
To: W3C Synchronized Multimedia for Publications CG <public-sync-media-pub@w3.org<mailto:public-sync-media-pub@w3.org>>

Hi all,

As we discussed at the web publications F2F last month, we have some drafts up for review:
https://w3c.github.io/sync-media-pub/

Have a look specifically at the proposed synchronization format:
https://w3c.github.io/sync-media-pub/narration.html

And how to include it with web publications:
https://w3c.github.io/sync-media-pub/packaging.html

I’ve extracted the issues from our previous drafts and discussions, and put them in the tracker:
https://github.com/w3c/sync-media-pub/issues

I also started putting together a sample and playing around with some ideas for a simple proof-of-concept for playback:
https://github.com/w3c/sync-media-pub/tree/master/samples/single-document

(for anyone really interested and wanting to dig in: it needs to be more clever about how it uses the audio api - the granularity of timeupdate in the browser isn’t very good).

Please feel free to comment, propose solutions, and otherwise share your thoughts.

Thanks
Marisa




--
Using Opera's mail client: http://www.opera.com/mail/
Received on Wednesday, 12 June 2019 13:49:16 UTC