Re: Intro - Multi-device Timing CG from Ingar Mæhlum Arntzen on 2018-03-13 (public-sync-media-pub@w3.org from March 2018)

From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
Date: Tue, 13 Mar 2018 01:15:36 +0100
To: Daniel Weck <daniel.weck@gmail.com>
Cc: public-sync-media-pub@w3.org, public-webtiming@w3.org
Message-ID: <CAOFBLLrLJiZZor-Omk9phHXr_8iu8J06S9Q5A5E3L6AWcXf2Gg@mail.gmail.com>
Hi Daniel

Please find some comments inline

2018-03-12 19:03 GMT+01:00 Daniel Weck <daniel.weck@gmail.com>:

> Thank you for your input Ingar (I assume this is your firstname?)
>

 Yes :)



> The "timing object" certainly looks like a useful and powerful API.
> If I am not mistaken this proposal focuses mainly on programmatic usage?
> (Javascript)
>

Thanks. Yes. Mostly Javascript for now, though we have rough
implementations in other languages too.


> If so, do you envision some kind of declarative syntax that would allow
> content creators (web and digital publishing) to encode a "static" /
> persistent representation of synchronized multi-media streams?
> For example EPUB3 "read aloud" / "talking books" are currently authored
> using the Media Overlays flavour of SMIL (XML), and long-form synchronized
> text+audio content is typically generated via some kind of semi-automated
> production process.
>

It would be possible to implement support for declarative syntax on top the
model we're proposing. Alternatively, you could integrate an existing
framework with the timing object, and thereby allow that framework to be
synchronized with anything else that uses timing objects. SMIL for instance
has an internal media clock somewhat similar to the timing object, so I
suspect integration wouldn't be too hard.

However, to be honest, I don't really see the declarative approach as a
pathway to simplicity and bliss for content creators. I think we can do
much better than that. Content creators should not be programming, nor
should they be editing declarative xml documents.

Persistent representation 1:

We're typically using plain JSON for persistent representations of timed
data, both static and dynamic. Typically very simple structures. For
instance, a book could be represented as a long list of paragraphs, where
each paragraph is associated with values for start and stop. This is enough
for playback of text in a Web page, There's a demo of the basic idea on the
timingsrc homepage [1], demonstrating how this is done with a timing object
and a sequencer. Scroll down or view the page source to take a look at the
code involved (it's not a lot). You can also open the page at multiple
devices/browser windows to verify that it is still synchronized :)

Content Authoring:

We could also use Web pages to create such JSON files. For instance, say
you have the list of book paragraphs (without timing info), and you have
the corresponding read-aloud audio track. What we could do is to
synchronize the audio with the timing object. Then during playback, the
content creator simply clicks on a button whenever (he hears that) a new
paragraph starts, causing a timestamp to be sampled from timing object and
put into the JSON file. Of course, this is only a very primitive authoring
tool, and this particular task should likely be solved by AI rather than
manual labour. Anyway, the point remains that web pages can be excellent
authoring tools for multi-media streams, shielding content creators from
both programming and xml editing. Furthermore, we're doing just fine
without a declarative media format.

Persistent representation 2:

When you mention "persistent representation of synchronized multi-media
streams" I'm also guessing that you mean something more than just a single
text track. In SMIL, for instance, there is focus on declaring all objects
that go into a product, as well as temporal relationships. We don't do it
that way. We simply represent things as separate resources of timed data,
and then synchronize them independently, within the same Web page, or in
different Web pages on different devices. In my experience, this is a much
more flexible and Webby approach.


I am thinking specifically about: (1) an HTML document, (2) a separate
> audio file representing the pre-recorded human narration of the HTML
> document, and (3) some kind of meta-structure / declarative syntax that
> would define the synchronization "points" between HTML elements and audio
> time ranges.
> Note that most existing "talking book" implementations render such
> combined text/audio streams by "highlighting" / emphasizing individual HTML
> fragments as they are being narrated (using CSS styles), but the same
> declarative expression could be rendered with a karaoke-like layout, etc.
>


I think my previous response already covered much of what you are
describing here.  The demo [1] also works on CSS styles for bold-facing the
active text :)


> Of course, there are also other important use-cases such as video+text,
> video+audio, etc., but I just wanted to pick your brain about a concrete
> use-case in digital publishing / EPUB3 e-books :)
>

You're welcome, though I don't have particular experience with digital
publishing :)



> Cheers, and thanks!
> Daniel
>
>
Thank you for great questions!

Ingar

[1] https://webtiming.github.io/timingsrc/doc/online_sequencer.html





> On 11 March 2018 at 21:34, Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
> wrote:
>
>> Hi Marisa
>>
>> Chris Needham of the Media & Enternainment IG made me aware of the CG
>> your setting up.
>>
>> This is a welcome initiative, and it is great to see more people
>> expressing the need for better sync support on the Web !
>>
>> I'm the chair of Multi-device Timing CG [2], so I thought I'd say a few
>> words about that as it seems we have similar objectives. Basically, the
>> scope of the Multi-device Timing CG is a broad one; synchronization of
>> anything with anything on the Web, whether it is text synced with A/V
>> within a single document, or across multiple devices. We have also proposed
>> a full solution to this problem for standardization, with the timing object
>> [3] being the central concept. I did have a look at the requirements
>> document [4] you linked to, and it seems to me the timing object (and the
>> other tools we have made available [5]) should be a good basis for
>> addressing your challenges. For instance, a karaoke-style text presentation
>> synchronized with audio should be quite easy to put together using these
>> tools.
>>
>> If you have some questions about the model we are proposing, and how it
>> may apply to your use cases, please send them our way :)
>>
>> Best regards,
>>
>> Ingar Arntzen
>>
>> [1] https://lists.w3.org/Archives/Public/public-sync-media-pub/2
>> 018Feb/0000.html
>> [2] https://www.w3.org/community/webtiming/
>> [3] http://webtiming.github.io/timingobject/
>> [4] https://github.com/w3c/publ-wg/wiki/Requirements-and-design-
>> options-for-synchronized-multimedia
>> [5] https://webtiming.github.io/timingsrc/
>>
>>
>
Received on Tuesday, 13 March 2018 00:17:00 UTC