- From: Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com>
- Date: Tue, 13 Mar 2018 01:15:36 +0100
- To: Daniel Weck <daniel.weck@gmail.com>
- Cc: public-sync-media-pub@w3.org, public-webtiming@w3.org
- Message-ID: <CAOFBLLrLJiZZor-Omk9phHXr_8iu8J06S9Q5A5E3L6AWcXf2Gg@mail.gmail.com>
Hi Daniel Please find some comments inline 2018-03-12 19:03 GMT+01:00 Daniel Weck <daniel.weck@gmail.com>: > Thank you for your input Ingar (I assume this is your firstname?) > Yes :) > The "timing object" certainly looks like a useful and powerful API. > If I am not mistaken this proposal focuses mainly on programmatic usage? > (Javascript) > Thanks. Yes. Mostly Javascript for now, though we have rough implementations in other languages too. > If so, do you envision some kind of declarative syntax that would allow > content creators (web and digital publishing) to encode a "static" / > persistent representation of synchronized multi-media streams? > For example EPUB3 "read aloud" / "talking books" are currently authored > using the Media Overlays flavour of SMIL (XML), and long-form synchronized > text+audio content is typically generated via some kind of semi-automated > production process. > It would be possible to implement support for declarative syntax on top the model we're proposing. Alternatively, you could integrate an existing framework with the timing object, and thereby allow that framework to be synchronized with anything else that uses timing objects. SMIL for instance has an internal media clock somewhat similar to the timing object, so I suspect integration wouldn't be too hard. However, to be honest, I don't really see the declarative approach as a pathway to simplicity and bliss for content creators. I think we can do much better than that. Content creators should not be programming, nor should they be editing declarative xml documents. Persistent representation 1: We're typically using plain JSON for persistent representations of timed data, both static and dynamic. Typically very simple structures. For instance, a book could be represented as a long list of paragraphs, where each paragraph is associated with values for start and stop. This is enough for playback of text in a Web page, There's a demo of the basic idea on the timingsrc homepage [1], demonstrating how this is done with a timing object and a sequencer. Scroll down or view the page source to take a look at the code involved (it's not a lot). You can also open the page at multiple devices/browser windows to verify that it is still synchronized :) Content Authoring: We could also use Web pages to create such JSON files. For instance, say you have the list of book paragraphs (without timing info), and you have the corresponding read-aloud audio track. What we could do is to synchronize the audio with the timing object. Then during playback, the content creator simply clicks on a button whenever (he hears that) a new paragraph starts, causing a timestamp to be sampled from timing object and put into the JSON file. Of course, this is only a very primitive authoring tool, and this particular task should likely be solved by AI rather than manual labour. Anyway, the point remains that web pages can be excellent authoring tools for multi-media streams, shielding content creators from both programming and xml editing. Furthermore, we're doing just fine without a declarative media format. Persistent representation 2: When you mention "persistent representation of synchronized multi-media streams" I'm also guessing that you mean something more than just a single text track. In SMIL, for instance, there is focus on declaring all objects that go into a product, as well as temporal relationships. We don't do it that way. We simply represent things as separate resources of timed data, and then synchronize them independently, within the same Web page, or in different Web pages on different devices. In my experience, this is a much more flexible and Webby approach. I am thinking specifically about: (1) an HTML document, (2) a separate > audio file representing the pre-recorded human narration of the HTML > document, and (3) some kind of meta-structure / declarative syntax that > would define the synchronization "points" between HTML elements and audio > time ranges. > Note that most existing "talking book" implementations render such > combined text/audio streams by "highlighting" / emphasizing individual HTML > fragments as they are being narrated (using CSS styles), but the same > declarative expression could be rendered with a karaoke-like layout, etc. > I think my previous response already covered much of what you are describing here. The demo [1] also works on CSS styles for bold-facing the active text :) > Of course, there are also other important use-cases such as video+text, > video+audio, etc., but I just wanted to pick your brain about a concrete > use-case in digital publishing / EPUB3 e-books :) > You're welcome, though I don't have particular experience with digital publishing :) > Cheers, and thanks! > Daniel > > Thank you for great questions! Ingar [1] https://webtiming.github.io/timingsrc/doc/online_sequencer.html > On 11 March 2018 at 21:34, Ingar Mæhlum Arntzen <ingar.arntzen@gmail.com> > wrote: > >> Hi Marisa >> >> Chris Needham of the Media & Enternainment IG made me aware of the CG >> your setting up. >> >> This is a welcome initiative, and it is great to see more people >> expressing the need for better sync support on the Web ! >> >> I'm the chair of Multi-device Timing CG [2], so I thought I'd say a few >> words about that as it seems we have similar objectives. Basically, the >> scope of the Multi-device Timing CG is a broad one; synchronization of >> anything with anything on the Web, whether it is text synced with A/V >> within a single document, or across multiple devices. We have also proposed >> a full solution to this problem for standardization, with the timing object >> [3] being the central concept. I did have a look at the requirements >> document [4] you linked to, and it seems to me the timing object (and the >> other tools we have made available [5]) should be a good basis for >> addressing your challenges. For instance, a karaoke-style text presentation >> synchronized with audio should be quite easy to put together using these >> tools. >> >> If you have some questions about the model we are proposing, and how it >> may apply to your use cases, please send them our way :) >> >> Best regards, >> >> Ingar Arntzen >> >> [1] https://lists.w3.org/Archives/Public/public-sync-media-pub/2 >> 018Feb/0000.html >> [2] https://www.w3.org/community/webtiming/ >> [3] http://webtiming.github.io/timingobject/ >> [4] https://github.com/w3c/publ-wg/wiki/Requirements-and-design- >> options-for-synchronized-multimedia >> [5] https://webtiming.github.io/timingsrc/ >> >> >
Received on Tuesday, 13 March 2018 00:16:59 UTC