Re: Experiments with WebVTT

Hi Makoto,

Thanks for having a look! 

So idx.json and data.json aren’t related to synchronized playback - they’re for for full-book searching via fuse.js [1]. They could be generated on the fly but that can be slow so I do it ahead of time. The search index is basically a list of text nodes' content and corresponding selectors, but doesn’t consider text phrases across boundaries. 

So while this would match either “Hello” or “How are you”:

<span>Hello</span><span>how are you?</span>

It would not match “Hello how are you?”. 

However, this is just a consequence of the simple approach used in this prototype. I think one could create a more sophisticated index that takes this into account.

Marisa

1. https://fusejs.io/

> On Nov 10, 2021, at 17:29, MURATA Makoto <eb2m-mrt@asahi-net.or.jp> wrote:
> 
> Marisa,
> 
> I tried your document with interest.  How are idx.json and data.json 
> used?  Will they be created from the HTML source on the fly?  
> I am wondering if your approach works for documents containing 
> ruby or BIDI.
> 
> Regards,
> Makoto
> 
> 2021年11月11日(木) 6:04 Marisa DeMeglio <marisa.demeglio@gmail.com <mailto:marisa.demeglio@gmail.com>>:
> Hi all,
> 
> I’ve been experimenting with WebVTT instead of SMIL as a synchronization format for a book with HTML text and audio narration.
> 
> Here is a link to a recent prototype I made, showing a book that has been transformed via custom conversion script from EPUB into plain HTML/CSS/JS:
> https://daisy.github.io/accessible-books-on-the-web/demos/moby-dick/chapter_001.html <https://daisy.github.io/accessible-books-on-the-web/demos/moby-dick/chapter_001.html>
> 
> In it, there’s a WebVTT track attached to an audio element:
> 
> <audio src="audio/chapter_001.mp3" controls="" id="abotw-audio”>
>     <track default="" kind="metadata" src="vtt/chapter_001.vtt">
> </audio>
> 
> And because this is a metadata track, the VTT file’s contents aren’t displayed as captions, just delivered as payload to the cue event handlers. One example of a cue in the VTT file is:
> 
> 1
> 00:00:00.000 --> 00:00:04.833
> {
>   "action”: {
>       "name”: "addCssClass”,
>       "data”: "sync-highlight”
>     },
>     "selector”: {
>       "type”: "FragmentSelector”,
>       "value”: “c01h01"
>     }
> }
> 
> 
> Comparing this approach to what we’ve been considering already (which is to extend SMIL [1]), I notice the following:
> 
> - Requirements on the audio files become more strict with WebVTT. There’s no way to say (without a chunk of custom scripting) that you want to play 10s from audio-1.mp3 and then 20s from audio-2.mp3 and then back to audio-1. You just play a file, start to end (or media fragment offset to media fragment offset).
> 
> - There are no structuring options for WebVTT, so any structural navigation (e.g. “escapability”, which is exiting narration of complex structures and returning to the main content flow) becomes entirely DOM-based with no parallel conveniences in the audio narration layer. I don’t think this is necessarily a negative thing.
> 
> - Implementation of WebVTT-based highlighting using the TextTrack API is very easy, vs SMIL.
> 
> - Unlike SyncMedia, WebVTT is not a drop-in replacement for Media Overlays. At least not without some packaging rules.
> 
> Anyway, just wanted to share. Discussion welcome! 
> 
> Marisa
> 
> 1. https://w3c.github.io/sync-media-pub/sync-media.html <https://w3c.github.io/sync-media-pub/sync-media.html>
> 
> 
> 
> 
> -- 
> Regards,
> Makoto

Received on Thursday, 11 November 2021 18:18:00 UTC