Re: Moving forward with sync media work from Ingar Mæhlum Arntzen on 2018-10-18 (public-sync-media-pub@w3.org from October 2018)

From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
Date: Thu, 18 Oct 2018 21:39:17 +0200
To: rdeltour@gmail.com
Cc: Marisa DeMeglio <marisa.demeglio@gmail.com>, public-sync-media-pub@w3.org
Message-ID: <CAOFBLLpqNY0+_t=baANWrMO-7a+EOh9cf4oWbGcFTqrU+Ho_hA@mail.gmail.com>

Hi Marisa and all.

I looked through the requirements again, and I still maintain that the
timingsrc[1] lib is exactly what you guys need as an engine for playback
and sync of both audio/video and text progression/navigation. True, it does
not provide any declarative support, but thats where you come in...
Timingsrc makes it real easy to define custom data formats and then build
custom viewers/players with custom navigation primitives etc, and it does
all the heavy lifting with the timing stuff. Though primitive in
appearance, this demo page [2] for the sequencer already solves a core part
of your challenge, ensuring that the right DOM element is activated at the
right time - relative to playback through text.

If you were to send me an audio file and a timed transcript to go with it,
(e.g. JSON with start and end timestamps for each word, then putting up a
rudimentary demo would likely be real quick.

Best, Ingar Arntzen

[1] https://webtiming.github.io/timingsrc/
[2] https://webtiming.github.io/timingsrc/doc/online_sequencer.html

Den tor. 18. okt. 2018 kl. 21:01 skrev Romain <rdeltour@gmail.com>:

>
>
> > On 18 Oct 2018, at 19:37, Marisa DeMeglio <marisa.demeglio@gmail.com>
> wrote:
> >
> >>
> >> 1. The use cases document says:
> >>> Text chunks are highlighted in sync with the audio playback, at the
> authored granularity
> >>
> >> This implies that the granularity _is_ authored. Sometimes, the sync
> could be generated on the fly, with sentence and/or word detection. Do we
> want to cover this use case too?
> >
> > So in this use case, a reading system gets a publication with some
> coarse level of synchronization (e.g. paragraph), and it provides, on the
> fly, finer granularities (word or sentence)?
>
> Yes, some kind of hybrid approach like that.
>
> > Are there tools that do this now? Not necessarily with audio ebooks but
> with any similar-enough types of content?
>
> Sentence/word detection applied to textual content is fairly common with
> TTS narration, but I don't know of any tool that does this with narrated
> (or pre-recorded) audio, no.
> But I could see that being useful, if a reading system with enough
> processing power implemented it :-)
>
> >
> >>  How would you define/describe testability in our context?
>
> I don't know… I think the details depend on the actual technical solution.
> Ideally a) tests should be runnable in an automated manner b) results
> should be comparable to reference results in an automated manner.
>
> >
> > To me, validation is a separate concern —  whatever format we produce to
> represent sync media should be validate-able. Not saying what the
> validation result should be used for, just that it should be possible to
> validate.
>
> OK!
>
> >
> > To put in context, I’ve gotten several suggestions over the year(s) of
> “why don’t you just use javascript” to create sync media books, and the
> answer always is that we want a declarative syntax, one of the reasons why
> being that it can be validated and migrated forward.
>
> Right, I understand.
> My take is that even a javascript-based approach would need some kind of
> declaration of a list or structure of audio pointers anyways, so if we
> standardize that beast with a simple-enough js-friendly format, we can make
> both worlds happy :-)
>
> Romain.
>

Received on Thursday, 18 October 2018 19:39:52 UTC