Re: Experiments with WebVTT from MURATA Makoto on 2021-11-11 (public-sync-media-pub@w3.org from November 2021)

From: MURATA Makoto <eb2m-mrt@asahi-net.or.jp>
Date: Thu, 11 Nov 2021 10:29:13 +0900
To: W3C Synchronized Multimedia for Publications CG <public-sync-media-pub@w3.org>
Message-ID: <CALvn5EBLXT3nXGKJ2ajsP80P3FaWY8qRU=ooyi3vR-=hefYTNQ@mail.gmail.com>

Marisa,

I tried your document with interest.  How are idx.json and data.json
used?  Will they be created from the HTML source on the fly?
I am wondering if your approach works for documents containing
ruby or BIDI.

Regards,
Makoto

2021年11月11日(木) 6:04 Marisa DeMeglio <marisa.demeglio@gmail.com>:

> Hi all,
>
> I’ve been experimenting with WebVTT instead of SMIL as a synchronization
> format for a book with HTML text and audio narration.
>
> Here is a link to a recent prototype I made, showing a book that has been
> transformed via custom conversion script from EPUB into plain HTML/CSS/JS:
>
> https://daisy.github.io/accessible-books-on-the-web/demos/moby-dick/chapter_001.html
>
> In it, there’s a WebVTT track attached to an audio element:
>
> <audio src="audio/chapter_001.mp3" controls="" id="abotw-audio”>
>     <track default="" kind="metadata" src="vtt/chapter_001.vtt">
> </audio>
>
> And because this is a metadata track, the VTT file’s contents aren’t
> displayed as captions, just delivered as payload to the cue event handlers.
> One example of a cue in the VTT file is:
>
> 1
> 00:00:00.000 --> 00:00:04.833
> {
>   "action”: {
>       "name”: "addCssClass”,
>       "data”: "sync-highlight”
>     },
>     "selector”: {
>       "type”: "FragmentSelector”,
>       "value”: “c01h01"
>     }
> }
>
>
> Comparing this approach to what we’ve been considering already (which is
> to extend SMIL [1]), I notice the following:
>
> - Requirements on the audio files become more strict with WebVTT. There’s
> no way to say (without a chunk of custom scripting) that you want to play
> 10s from audio-1.mp3 and then 20s from audio-2.mp3 and then back to
> audio-1. You just play a file, start to end (or media fragment offset to
> media fragment offset).
>
> - There are no structuring options for WebVTT, so any structural
> navigation (e.g. “escapability”, which is exiting narration of complex
> structures and returning to the main content flow) becomes entirely
> DOM-based with no parallel conveniences in the audio narration layer. I
> don’t think this is necessarily a negative thing.
>
> - Implementation of WebVTT-based highlighting using the TextTrack API is
> very easy, vs SMIL.
>
> - Unlike SyncMedia, WebVTT is not a drop-in replacement for Media
> Overlays. At least not without some packaging rules.
>
> Anyway, just wanted to share. Discussion welcome!
>
> Marisa
>
> 1. https://w3c.github.io/sync-media-pub/sync-media.html
>
>
>

-- 
Regards,
Makoto

Received on Thursday, 11 November 2021 01:30:05 UTC