Re: Moving forward with sync media work from Ingar Mæhlum Arntzen on 2018-10-19 (public-sync-media-pub@w3.org from October 2018)

From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
Date: Fri, 19 Oct 2018 09:30:21 +0200
To: ivan@w3.org
Cc: Marisa DeMeglio <marisa.demeglio@gmail.com>, rdeltour@gmail.com, public-sync-media-pub@w3.org
Message-ID: <CAOFBLLr9q13FnbjcoVnh3gQJuEktNr6vTKs-t2hwE+2VGyKKqQ@mail.gmail.com>
Hi Herman and al.

My apologies for continuing on a tangent here.

I also don't see the distance as very large. More like a perfect match :)

Here's the thing. In order to sync stuff you always need two things 1) data
with associated timing info and 2) logic for making stuff happen at the
right time (playback, navigation etc.).

Traditionally, these two things have often been bundled together, and the
timing logic is typically hidden under the hood. SMIL would be an example.
This is a bit unfortunate, because it means that if the specific data
format of the framework isn't exactly matching your requirements, you've
got to start from scratch. This is where you are, if I read the documents
correctly.

The key idea of the timingsrc model, is to make the timing logic available
*without* ties to a specific data model.

Importantly, this does not mean that the approach is anti-declarative in
any way. Rather the opposite. By leveraging the availability of an API for
generic timing logic, your implementors don't have to reinvent all of that
(which isn't trivial by the way), and you can get a head start on doing
what you care about the most; defining a declarative format perfectly
matched to your specifications.

Best regards,

Ingar








Den fre. 19. okt. 2018 kl. 07:21 skrev Ivan Herman <ivan@w3.org>:

> My apologies commenting on something that I have very little knowledge of…
>
> I wonder whether the distance between what Ingar refers to and what we are
> looking at is that large. If my very superficial reading is correct,
> "Timing Object" is at the core of what timingsrc is implementing. The way I
> read is that the timing object is based on some sort of a vocabulary
> including terms like velocity, acceleration, etc.
>
> On the other hand, the publishing community is looking for a (mostly)
> declarative approach; it would be difficult for that community to move away
> toward a purely programming approach. Non-declarative approach may also
> raise accessibility issues.
>
> However... isn't it possible to "abstract out" that vocabulary in such a
> way that a publisher would set those data as part of, say, the Web
> Publication Manifest, and using these terms by an implementation just like
> timingsrc.
>
> I may be pretty much off topic here, though.
>
> Ivan
>
> [1] http://webtiming.github.io/timingobject/
>
> On 18 Oct 2018, at 21:55, Marisa DeMeglio <marisa.demeglio@gmail.com>
> wrote:
>
> Hi Ingar,
>
> It sounds like your work could be very useful to implementors. What we are
> discussing here is not the “how” of processing/playback, but rather the
> “what” - the actual declarative format. So that’s why I keep going on about
> being declarative. I would be curious, when we have some draft syntax and
> examples, how it could map to your playback engine. Looking forward to some
> experimenting!
>
> Thanks
> Marisa
>
> On Oct 18, 2018, at 12:39 PM, Ingar Mæhlum Arntzen <
> ingar.arntzen@gmail.com> wrote:
>
> Hi Marisa and all.
>
> I looked through the requirements again, and I still maintain that the
> timingsrc[1] lib is exactly what you guys need as an engine for playback
> and sync of both audio/video and text progression/navigation. True, it does
> not provide any declarative support, but thats where you come in...
> Timingsrc makes it real easy to define custom data formats and then build
> custom viewers/players with custom navigation primitives etc, and it does
> all the heavy lifting with the timing stuff. Though primitive in
> appearance, this demo page [2] for the sequencer already solves a core part
> of your challenge, ensuring that the right DOM element is activated at the
> right time - relative to playback through text.
>
> If you were to send me an audio file and a timed transcript to go with it,
> (e.g. JSON with start and end timestamps for each word, then putting up a
> rudimentary demo would likely be real quick.
>
> Best, Ingar Arntzen
>
> [1] https://webtiming.github.io/timingsrc/
> [2] https://webtiming.github.io/timingsrc/doc/online_sequencer.html
>
> Den tor. 18. okt. 2018 kl. 21:01 skrev Romain <rdeltour@gmail.com>:
>
>>
>>
>> > On 18 Oct 2018, at 19:37, Marisa DeMeglio <marisa.demeglio@gmail.com>
>> wrote:
>> >
>> >>
>> >> 1. The use cases document says:
>> >>> Text chunks are highlighted in sync with the audio playback, at the
>> authored granularity
>> >>
>> >> This implies that the granularity _is_ authored. Sometimes, the sync
>> could be generated on the fly, with sentence and/or word detection. Do we
>> want to cover this use case too?
>> >
>> > So in this use case, a reading system gets a publication with some
>> coarse level of synchronization (e.g. paragraph), and it provides, on the
>> fly, finer granularities (word or sentence)?
>>
>> Yes, some kind of hybrid approach like that.
>>
>> > Are there tools that do this now? Not necessarily with audio ebooks but
>> with any similar-enough types of content?
>>
>> Sentence/word detection applied to textual content is fairly common with
>> TTS narration, but I don't know of any tool that does this with narrated
>> (or pre-recorded) audio, no.
>> But I could see that being useful, if a reading system with enough
>> processing power implemented it :-)
>>
>> >
>> >>  How would you define/describe testability in our context?
>>
>> I don't know… I think the details depend on the actual technical
>> solution. Ideally a) tests should be runnable in an automated manner b)
>> results should be comparable to reference results in an automated manner.
>>
>> >
>> > To me, validation is a separate concern —  whatever format we produce
>> to represent sync media should be validate-able. Not saying what the
>> validation result should be used for, just that it should be possible to
>> validate.
>>
>> OK!
>>
>> >
>> > To put in context, I’ve gotten several suggestions over the year(s) of
>> “why don’t you just use javascript” to create sync media books, and the
>> answer always is that we want a declarative syntax, one of the reasons why
>> being that it can be validated and migrated forward.
>>
>> Right, I understand.
>> My take is that even a javascript-based approach would need some kind of
>> declaration of a list or structure of audio pointers anyways, so if we
>> standardize that beast with a simple-enough js-friendly format, we can make
>> both worlds happy :-)
>>
>> Romain.
>>
>
>
>
> ----
> Ivan Herman, W3C
> Publishing@W3C Technical Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: https://orcid.org/0000-0003-0782-2704
>
>
Received on Friday, 19 October 2018 07:31:59 UTC