Re: Synchronized narration work from the Sync Media Pub CG from Ingar Mæhlum Arntzen on 2019-06-12 (public-web-and-tv@w3.org from June 2019)

From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
Date: Wed, 12 Jun 2019 15:06:25 +0200
To: Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: "Charles 'chaals' (McCathie) Nevile" <chaals@yandex.ru>, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>, "public-audio-description@w3.org" <public-audio-description@w3.org>, Marisa DeMeglio <marisa.demeglio@gmail.com>, Daniel Weck <daniel.weck@gmail.com>
Message-ID: <CAOFBLLoYo2TjYyRsuYxJqEWYwxtfK6HH-0J+SgDJx7BF3F=JrQ@mail.gmail.com>
Hi Nigel.

If I understand you correctly, the two modes you are referring to are
playback of items in time-domain and playback in some other domain, e.g.
content-ordering domain.

Using timing objects and sequencers it is pretty straightforward to support
both modes (as well as shifting dynamically between them). For instance, we
use independent timing objects for the different domains where we need
playback, and then we register the data with sequencers driven by the
different timingobjects. UI can then dispatch interaction events to the
correct playback controls (i.e.. timing object(s)).

I don't think there are any particular requirements on the data
representation to make this work. If you can deduce how a particular data
item relates to a given axis (time, ordering or whatever), then you
register sequencer cues for the item, an have enter/exit events delivered
at the correct time.

In many applications it would then be a matter of opinion which counts as
"primary" content. For instance, with a slide show presentation synced to a
video (of the slide-show presenter) you could navigate the session by video
controls whenever that makes sense, or slide-show controls in the
interactive slide viewer (on a different device). One would also want to
support dynamic switching as you leave a "primary-mode-reading" and enter
"primary-mode-listen", perhaps because your circumstances change
mid-session.

I guess a key source for this flexibility in the timing object model is the
insight that no media should be "primary" in a technical sense, implying
that media control must be separate from media.

Ingar Arntzen
Multi-device Timing Community Group








ons. 12. jun. 2019 kl. 13:59 skrev Nigel Megitt <nigel.megitt@bbc.co.uk>:

> Hi,
>
> This is interesting work, and I’m particularly interested in the context
> of work we’re doing in the Audio Description Community Group, which is
> related but different.
>
> Perhaps in answer to Chaals’s question, there’s a technical difference
> here that feels like we ought to be able to work around, but is actually
> quite fundamental to the way that WebVTT and TTML work, for example, but
> not this use case.
>
> Whereas timed text formats are predicated on media *times*, the
> requirement for this application, if I’ve understood it correctly, is to
> link the audio rendition to element content. This is why it works that you
> can click on some text and hear the audio for that directly. The text isn’t
> acting as a play-head-position link, i.e. “on click, move play head to
> 13.2s”, but there is a mode of operation where you hear the consecutive
> elements’ linked audio being played consecutively, as if it is continuous
> media, with a link back to the highlighting of each separate snippet of
> text.
>
> I’ve seen plenty of examples of timed transcripts of videos where the text
> at the play head time is highlighted and the user can click on any random
> place in the transcript to make the play head jump there, but I think
> there’s a semantic mismatch between that experience and this one – here the
> text content is the primary thing, not the video/audio.
>
> I recall a use case being discussed previously as well where, on a paged
> view of the text, a sound effect can be played each time there’s a page
> turn. This approach is amenable to that, assuming that the same fragment id
> can be reused on different elements in the source document.
>
> I’d love to see a single approach that could make both use cases work, but
> I’m not sure what it would look like. SMIL probably got somewhere very
> close, using the event based model, where events could be generated by time
> on a media playback timeline or via some other API that fires them, but I
> sense that the willingness to implement all of the complexity involved in
> SMIL may be dwindling, and in any case, it doesn’t by itself resolve the
> problem of how to express timing *and* element content event triggering
> in a timed text document format.
>
> Kind regards,
>
> Nigel
>
>
> From: "Charles 'chaals' (McCathie) Nevile" <chaals@yandex.ru>
> Date: Wednesday, 12 June 2019 at 09:58
> To: "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>, "
> public-audio-description@w3.org" <public-audio-description@w3.org>,
> Marisa DeMeglio <marisa.demeglio@gmail.com>
> Cc: Daniel Weck <daniel.weck@gmail.com>
> Subject: Re: Synchronized narration work from the Sync Media Pub CG
> Resent-From: <public-audio-description@w3.org>
> Resent-Date: Wednesday, 12 June 2019 at 09:59
>
> Hi,
>
> Thanks for the pointer.
>
> I'm curious why you wouldn't use e.g. WebVTT or another existing markup
> that carries the associations between text and audio renderings.
>
> cheers
>
> Chaals
>
> On Wed, 12 Jun 2019 02:12:26 +0200, Marisa DeMeglio <
> marisa.demeglio@gmail.com> wrote:
>
> Hi all,
>
> Chris Needham suggested that I share with you what we’ve been working on
> in the Synchronized Media for Publications CG - it’s a lightweight JSON
> format for representing pre-recorded narration synchronized with HTML, to
> provide an accessible reading experience. The primary use case is in web
> publications (we are involved in the Publishing WG), but it has been
> designed to live as a standalone “overlay” for HTML documents. Below are
> some links to the latest drafts.
>
> And, just to give you an idea of the basic user experience, here are two
> slightly different proof of concept demos for the sample in our repository:
>
> -
> https://raw.githack.com/w3c/sync-media-pub/master/samples/single-document/index.html
> -
> https://raw.githack.com/w3c/sync-media-pub/feature/custom-read-aloud-player/samples/single-document/index.html
>
>
> Interested in hearing your thoughts!
>
> Marisa DeMeglio
> DAISY Consortium
>
>
> Begin forwarded message:
>
> *From: *Marisa DeMeglio <marisa.demeglio@gmail.com>
> *Subject: **Drafts and a sample*
> *Date: *June 8, 2019 at 6:58:13 PM PDT
> *To: *W3C Synchronized Multimedia for Publications CG <
> public-sync-media-pub@w3.org>
>
> Hi all,
>
> As we discussed at the web publications F2F last month, we have some
> drafts up for review:
> https://w3c.github.io/sync-media-pub/
>
> Have a look specifically at the proposed synchronization format:
> https://w3c.github.io/sync-media-pub/narration.html
>
> And how to include it with web publications:
> https://w3c.github.io/sync-media-pub/packaging.html
>
> I’ve extracted the issues from our previous drafts and discussions, and
> put them in the tracker:
> https://github.com/w3c/sync-media-pub/issues
>
> I also started putting together a sample and playing around with some
> ideas for a simple proof-of-concept for playback:
> https://github.com/w3c/sync-media-pub/tree/master/samples/single-document
>
> (for anyone really interested and wanting to dig in: it needs to be more
> clever about how it uses the audio api - the granularity of timeupdate in
> the browser isn’t very good).
>
> Please feel free to comment, propose solutions, and otherwise share your
> thoughts.
>
> Thanks
> Marisa
>
>
>
>
>
> --
> Using Opera's mail client: http://www.opera.com/mail/
>
>
Received on Wednesday, 12 June 2019 13:07:03 UTC