Re: Intro - Multi-device Timing CG from Ingar Mæhlum Arntzen on 2018-03-13 (public-webtiming@w3.org from March 2018)

From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
Date: Tue, 13 Mar 2018 19:52:15 +0100
To: Marisa DeMeglio <marisa.demeglio@gmail.com>
Cc: Daniel Weck <daniel.weck@gmail.com>, public-sync-media-pub@w3.org, public-webtiming@w3.org
Message-ID: <CAOFBLLoPWBPAzr+R1as767zgD1yKavVhVmuMhHf+=RsYGoCj1Q@mail.gmail.com>
Hi Marisa

2018-03-12 19:53 GMT+01:00 Marisa DeMeglio <marisa.demeglio@gmail.com>:

> Thanks for reaching out and for the links to your work!
>
> In addition to these excellent questions from Daniel, I am wondering about
> web browser support (or anticipated browser support) — what do you expect?
>


Short answer: This already works. The timingsrc programming model [1]
already has the most vital tools. Use is not dependent on standardization.

Longer answer. There are still some issues when requirements for precision
are very strict. Say echoless audio playback from a group of smart phones.
Humans are sensitive to sync errors down to about 6-7 milliseconds. The
synchronization precision you get when synchronizing HTML5 audio/video now
is also about 6-7 milliseconds. This means that we typically get echoless
playback on good devices, but a slight echo on cheaper/older ones.
Standardization would mean that browser vendors could at least fix some of
the obvious weaknesses of media elements with regards to synchronization.
So, universal support for echoless playback depends on standardization.



> What is the relationship, if any, between Multi-device timing and TTML?
> Are the APIs overlapping or complementary (or “it’s complicated”)?
>

If you mean TTML as a data format, there is no overlap. The solutions we
are advocating in the multi-device timing CG are concerned with mechanism
for timing, synchronization, media control/playback. Timing concepts have
traditionally been mixed with data formats (and delivery methods) (e.g.
SMIL). In contrast, one of the principal design goals for us has been to
maintain a clear separation of timing and data (media content). The benefit
of this is that timing solutions can be used across very different data
formats, delivery methods and across different application domains. This
flexibility may also reduce our dependence on standardized formats, such as
TTML.

If you mean TTML API [2], there is much overlap. TTML API seems to be an
integration between the TTML data format and the texttrack mechanism of
HTML5 media elements. The timing object model supports the same capability,
using the Sequencer [3], which is analogous to the texttrack mechanism of
HTML5 media elements.

There are some important differences
- the sequencer improves on a number of weaknesses of the texttrackmechanism
- the sequencer may be used with any data format for timed cues, not only
TTML
- the sequencer may be used without necessarily requiring a media element
- the sequencer does not do any rendering of the cues, this is entirely up
the application
- the sequencer is open for multi-device synchronization (via the timing
object)



>  In EPUB3, we use SMIL to represent media synchronization, which gives us
> a declarative syntax, but no API. Ideally for web publications, we’d have
> both.
>
>
As the timing object model does not dictate any changes in data formats or
delivery mechanisms, it is typically easy to integrate with other
frameworks. As I mentioned in my respons to Daniel, if you want to
integrate SMIL with the timing object model, that should not be difficult.


[1] https://webtiming.github.io/timingsrc/
[2] https://dvcs.w3.org/hg/ttml/raw-file/default/ttml2-api/Overview.html
[3] https://webtiming.github.io/timingsrc/doc/background_sequencer.html


Hope this was helpful :)

Best regards,

Ingar


Marisa
>
>



> On Mar 12, 2018, at 11:03 AM, Daniel Weck <daniel.weck@gmail.com> wrote:
>
> Thank you for your input Ingar (I assume this is your firstname?)
>
> The "timing object" certainly looks like a useful and powerful API.
> If I am not mistaken this proposal focuses mainly on programmatic usage?
> (Javascript)
>
> If so, do you envision some kind of declarative syntax that would allow
> content creators (web and digital publishing) to encode a "static" /
> persistent representation of synchronized multi-media streams?
> For example EPUB3 "read aloud" / "talking books" are currently authored
> using the Media Overlays flavour of SMIL (XML), and long-form synchronized
> text+audio content is typically generated via some kind of semi-automated
> production process.
>
> I am thinking specifically about: (1) an HTML document, (2) a separate
> audio file representing the pre-recorded human narration of the HTML
> document, and (3) some kind of meta-structure / declarative syntax that
> would define the synchronization "points" between HTML elements and audio
> time ranges.
> Note that most existing "talking book" implementations render such
> combined text/audio streams by "highlighting" / emphasizing individual HTML
> fragments as they are being narrated (using CSS styles), but the same
> declarative expression could be rendered with a karaoke-like layout, etc.
> Of course, there are also other important use-cases such as video+text,
> video+audio, etc., but I just wanted to pick your brain about a concrete
> use-case in digital publishing / EPUB3 e-books :)
>
> Cheers, and thanks!
> Daniel
>
>
>
>
>
> On 11 March 2018 at 21:34, Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
> wrote:
>
>> Hi Marisa
>>
>> Chris Needham of the Media & Enternainment IG made me aware of the CG
>> your setting up.
>>
>> This is a welcome initiative, and it is great to see more people
>> expressing the need for better sync support on the Web !
>>
>> I'm the chair of Multi-device Timing CG [2], so I thought I'd say a few
>> words about that as it seems we have similar objectives. Basically, the
>> scope of the Multi-device Timing CG is a broad one; synchronization of
>> anything with anything on the Web, whether it is text synced with A/V
>> within a single document, or across multiple devices. We have also proposed
>> a full solution to this problem for standardization, with the timing object
>> [3] being the central concept. I did have a look at the requirements
>> document [4] you linked to, and it seems to me the timing object (and the
>> other tools we have made available [5]) should be a good basis for
>> addressing your challenges. For instance, a karaoke-style text presentation
>> synchronized with audio should be quite easy to put together using these
>> tools.
>>
>> If you have some questions about the model we are proposing, and how it
>> may apply to your use cases, please send them our way :)
>>
>> Best regards,
>>
>> Ingar Arntzen
>>
>> [1] https://lists.w3.org/Archives/Public/public-sync-media-pub/2
>> 018Feb/0000.html
>> [2] https://www.w3.org/community/webtiming/
>> [3] http://webtiming.github.io/timingobject/
>> [4] https://github.com/w3c/publ-wg/wiki/Requirements-and-design-
>> options-for-synchronized-multimedia
>> [5] https://webtiming.github.io/timingsrc/
>>
>>
>
>
Received on Tuesday, 13 March 2018 18:52:39 UTC