Re: Moving forward with sync media work

Hi Ingar,

It sounds like your work could be very useful to implementors. What we are discussing here is not the “how” of processing/playback, but rather the “what” - the actual declarative format. So that’s why I keep going on about being declarative. I would be curious, when we have some draft syntax and examples, how it could map to your playback engine. Looking forward to some experimenting! 

Thanks
Marisa

> On Oct 18, 2018, at 12:39 PM, Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com> wrote:
> 
> Hi Marisa and all.
> 
> I looked through the requirements again, and I still maintain that the timingsrc[1] lib is exactly what you guys need as an engine for playback and sync of both audio/video and text progression/navigation. True, it does not provide any declarative support, but thats where you come in... Timingsrc makes it real easy to define custom data formats and then build custom viewers/players with custom navigation primitives etc, and it does all the heavy lifting with the timing stuff. Though primitive in appearance, this demo page [2] for the sequencer already solves a core part of your challenge, ensuring that the right DOM element is activated at the right time - relative to playback through text.
> 
> If you were to send me an audio file and a timed transcript to go with it, (e.g. JSON with start and end timestamps for each word, then putting up a rudimentary demo would likely be real quick.
> 
> Best, Ingar Arntzen
> 
> [1] https://webtiming.github.io/timingsrc/ <https://webtiming.github.io/timingsrc/>
> [2] https://webtiming.github.io/timingsrc/doc/online_sequencer.html <https://webtiming.github.io/timingsrc/doc/online_sequencer.html>
> Den tor. 18. okt. 2018 kl. 21:01 skrev Romain <rdeltour@gmail.com <mailto:rdeltour@gmail.com>>:
> 
> 
> > On 18 Oct 2018, at 19:37, Marisa DeMeglio <marisa.demeglio@gmail.com <mailto:marisa.demeglio@gmail.com>> wrote:
> > 
> >> 
> >> 1. The use cases document says:
> >>> Text chunks are highlighted in sync with the audio playback, at the authored granularity
> >> 
> >> This implies that the granularity _is_ authored. Sometimes, the sync could be generated on the fly, with sentence and/or word detection. Do we want to cover this use case too?
> > 
> > So in this use case, a reading system gets a publication with some coarse level of synchronization (e.g. paragraph), and it provides, on the fly, finer granularities (word or sentence)?
> 
> Yes, some kind of hybrid approach like that.
> 
> > Are there tools that do this now? Not necessarily with audio ebooks but with any similar-enough types of content?
> 
> Sentence/word detection applied to textual content is fairly common with TTS narration, but I don't know of any tool that does this with narrated (or pre-recorded) audio, no.
> But I could see that being useful, if a reading system with enough processing power implemented it :-)
> 
> > 
> >>  How would you define/describe testability in our context? 
> 
> I don't know… I think the details depend on the actual technical solution. Ideally a) tests should be runnable in an automated manner b) results should be comparable to reference results in an automated manner.
> 
> > 
> > To me, validation is a separate concern —  whatever format we produce to represent sync media should be validate-able. Not saying what the validation result should be used for, just that it should be possible to validate.
> 
> OK!
> 
> > 
> > To put in context, I’ve gotten several suggestions over the year(s) of “why don’t you just use javascript” to create sync media books, and the answer always is that we want a declarative syntax, one of the reasons why being that it can be validated and migrated forward. 
> 
> Right, I understand. 
> My take is that even a javascript-based approach would need some kind of declaration of a list or structure of audio pointers anyways, so if we standardize that beast with a simple-enough js-friendly format, we can make both worlds happy :-)
> 
> Romain.

Received on Thursday, 18 October 2018 19:56:43 UTC