- From: Ivan Herman <ivan@w3.org>
- Date: Fri, 19 Oct 2018 07:21:33 +0200
- To: Marisa DeMeglio <marisa.demeglio@gmail.com>
- Cc: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>, Romain Deltour <rdeltour@gmail.com>, W3C Synchronized Multimedia for Publications CG <public-sync-media-pub@w3.org>
- Message-Id: <CAE6B72C-D499-4605-BA9E-3E97A671C243@w3.org>
My apologies commenting on something that I have very little knowledge of… I wonder whether the distance between what Ingar refers to and what we are looking at is that large. If my very superficial reading is correct, "Timing Object" is at the core of what timingsrc is implementing. The way I read is that the timing object is based on some sort of a vocabulary including terms like velocity, acceleration, etc. On the other hand, the publishing community is looking for a (mostly) declarative approach; it would be difficult for that community to move away toward a purely programming approach. Non-declarative approach may also raise accessibility issues. However... isn't it possible to "abstract out" that vocabulary in such a way that a publisher would set those data as part of, say, the Web Publication Manifest, and using these terms by an implementation just like timingsrc. I may be pretty much off topic here, though. Ivan [1] http://webtiming.github.io/timingobject/ <http://webtiming.github.io/timingobject/> > On 18 Oct 2018, at 21:55, Marisa DeMeglio <marisa.demeglio@gmail.com <mailto:marisa.demeglio@gmail.com>> wrote: > > Hi Ingar, > > It sounds like your work could be very useful to implementors. What we are discussing here is not the “how” of processing/playback, but rather the “what” - the actual declarative format. So that’s why I keep going on about being declarative. I would be curious, when we have some draft syntax and examples, how it could map to your playback engine. Looking forward to some experimenting! > > Thanks > Marisa > >> On Oct 18, 2018, at 12:39 PM, Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com <mailto:ingar.arntzen@gmail.com>> wrote: >> >> Hi Marisa and all. >> >> I looked through the requirements again, and I still maintain that the timingsrc[1] lib is exactly what you guys need as an engine for playback and sync of both audio/video and text progression/navigation. True, it does not provide any declarative support, but thats where you come in... Timingsrc makes it real easy to define custom data formats and then build custom viewers/players with custom navigation primitives etc, and it does all the heavy lifting with the timing stuff. Though primitive in appearance, this demo page [2] for the sequencer already solves a core part of your challenge, ensuring that the right DOM element is activated at the right time - relative to playback through text. >> >> If you were to send me an audio file and a timed transcript to go with it, (e.g. JSON with start and end timestamps for each word, then putting up a rudimentary demo would likely be real quick. >> >> Best, Ingar Arntzen >> >> [1] https://webtiming.github.io/timingsrc/ <https://webtiming.github.io/timingsrc/> >> [2] https://webtiming.github.io/timingsrc/doc/online_sequencer.html <https://webtiming.github.io/timingsrc/doc/online_sequencer.html> >> Den tor. 18. okt. 2018 kl. 21:01 skrev Romain <rdeltour@gmail.com <mailto:rdeltour@gmail.com>>: >> >> >> > On 18 Oct 2018, at 19:37, Marisa DeMeglio <marisa.demeglio@gmail.com <mailto:marisa.demeglio@gmail.com>> wrote: >> > >> >> >> >> 1. The use cases document says: >> >>> Text chunks are highlighted in sync with the audio playback, at the authored granularity >> >> >> >> This implies that the granularity _is_ authored. Sometimes, the sync could be generated on the fly, with sentence and/or word detection. Do we want to cover this use case too? >> > >> > So in this use case, a reading system gets a publication with some coarse level of synchronization (e.g. paragraph), and it provides, on the fly, finer granularities (word or sentence)? >> >> Yes, some kind of hybrid approach like that. >> >> > Are there tools that do this now? Not necessarily with audio ebooks but with any similar-enough types of content? >> >> Sentence/word detection applied to textual content is fairly common with TTS narration, but I don't know of any tool that does this with narrated (or pre-recorded) audio, no. >> But I could see that being useful, if a reading system with enough processing power implemented it :-) >> >> > >> >> How would you define/describe testability in our context? >> >> I don't know… I think the details depend on the actual technical solution. Ideally a) tests should be runnable in an automated manner b) results should be comparable to reference results in an automated manner. >> >> > >> > To me, validation is a separate concern — whatever format we produce to represent sync media should be validate-able. Not saying what the validation result should be used for, just that it should be possible to validate. >> >> OK! >> >> > >> > To put in context, I’ve gotten several suggestions over the year(s) of “why don’t you just use javascript” to create sync media books, and the answer always is that we want a declarative syntax, one of the reasons why being that it can be validated and migrated forward. >> >> Right, I understand. >> My take is that even a javascript-based approach would need some kind of declaration of a list or structure of audio pointers anyways, so if we standardize that beast with a simple-enough js-friendly format, we can make both worlds happy :-) >> >> Romain. > ---- Ivan Herman, W3C Publishing@W3C Technical Lead Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/> mobile: +31-641044153 ORCID ID: https://orcid.org/0000-0003-0782-2704 <https://orcid.org/0000-0003-0782-2704>
Received on Friday, 19 October 2018 05:21:47 UTC