On the topic of interactive video, interactive video is envisioned in Web and EPUB (digital textbook) contexts. I would very much like for users to be able to play standards-based interactive videos in Web documents and digital textbooks.

On the topic of WebVTT+RDF, the motivating use case could be inside or outside of Web contexts; there is the Immersive Web endeavor: .

Brainstorming: generalizing from the motivating use case of extensible dynamic metadata, and while interactive video is topical, one can consider:

  1.  Utilizing colored animated silhouettes in secondary video tracks.
     *   In these secondary video tracks, the color black would be reserved for indicating the absence of a colored animated silhouette.
     *   Colored animated silhouettes could mirror the shapes and motions of visual things in primary video tracks, facilitating interactivity scenarios.

                                                               i.      These silhouettes would enable arbitrarily-shaped interactive elements.

     *   Colored animated silhouettes could also be rectangular and mirror the motions of text or imagery.

                                                               i.      This would facilitate traditional hyperlinks in videos.

     *   Colored animated silhouettes could surface through a JavaScript API, and are envisioned as supporting various UI events.

                                                               i.      It is noteworthy that colored animated silhouettes would appear and disappear from collections as visual things appear and disappear from primary video tracks.

  1.  Utilizing combinations of colored animated silhouettes in secondary video tracks and extensible dynamic metadata (e.g., WebVTT+RDF).
     *   One could attach graph-based metadata to arbitrarily-shaped silhouettes or visual regions which map to visual things in videos.

                                                               i.      This would be useful for semantically describing objects in and events occurring in videos such that the descriptive data could be placed in the video containers.

                                                             ii.      See also: .

  1.  Existing and emerging computer vision tools and pipelines could be utilized to produce these interactivity-related and semantics-related tracks in videos.

I hope that standards-based interactive videos can be supported for the Web and digital textbooks. I think that there are more use cases for extensible dynamic metadata than the motivating use case and many of these other use cases appear to be within a Web or digital textbook context.

Best regards,

From: Silvia Pfeiffer<>
Sent: Saturday, June 26, 2021 7:18 PM
To: Adam Sobieski<>
Subject: Re: WebVTT+JS and WebVTT+RDF

Hi Adam,

So are your use cases within a web context our outside of it?


On Sun, Jun 27, 2021, 5:19 AM Adam Sobieski <<>> wrote:

Envisioned use cases for WebVTT+JS include interactive video. I am interested in educational scenarios for interactive video.

With new open standards for interactive video, interactive videos would be readily authored, self-contained, portable, secure, accessible, interoperable, readily analyzed, and readily indexed and searched.

A solution for interactive video involves placing JavaScript scripts and/or WebAssembly modules in video containers. This line of thinking resulted in the idea of putting JavaScript lambda expressions in WebVTT tracks.

As for WebVTT+RDF, new open standards could be useful for providing extensible dynamic metadata for biomedical, scientific, and industrial sensors and devices (e.g., digital microscopes).

Expanding upon these WebVTT+RDF ideas, we might also consider “graph video” concepts, which could add keyframes, also known as “intra-frames”, to facilitate efficient seeking through videos. That is, instead of an entire RDF graph at the beginning of a video track followed by “RDF diffs” or “RDF deltas” throughout the remainder of the track, a video track could provide entire RDF graphs once in a while, at keyframes or “intra-frames”, and provide storage-efficient “RDF diffs” or “RDF deltas” between these keyframes or “intra-frames”. Then, one could more efficiently seek through recorded videos while also having access to instantaneous metadata.

The context that these WebVTT+RDF ideas occurred in is that I am in the midst of proposing some standards work to MPAI ( to ensure that live streams and recordings from biomedical, scientific, and industrial sensors and devices can be utilized in mixed-reality collaborative spaces (such as applications built using Microsoft Mesh). Interoperability with machine learning and computer vision technologies is also being considered.

Best regards,

From: Silvia Pfeiffer<>
Sent: Friday, June 25, 2021 9:00 PM
To: Adam Sobieski<>
Subject: Re: WebVTT+JS and WebVTT+RDF

Hi Adam,

WebVTT has been built to be flexible for this kind of time aligned data, so you should be able to use it for that.

What are the use cases behind this? What is your motivation? Are you suggesting new standards be developed?

For example, the cue entry and cue exit JavaScript is already possible when on a web page, no new standards necessary.

Is the microscope use cases big enough to create a standard for or is it just for a research piece or a company's proprietary solution?


On Sat, Jun 26, 2021, 5:26 AM Adam Sobieski <<>> wrote:
Semantic Web Interest Group,
Web Media Text Tracks Community Group,

Hello. I would like to share some thoughts on WebVTT+JS and WebVTT+RDF.

Timed Lambda Expressions (WebVTT+JS)

The following syntax example shows a way of embedding JavaScript in WebVTT tracks. The example provides two lambda functions for a cue, one to be called when the cue is entered and the other to be called when the cue is exited.

05:10:00.000 --> 05:12:15.000

Dynamic Graphs (WebVTT+RDF)

An example scenario for dynamic metadata is that of live streams and recordings from digital microscopes. In the scenario, dynamic metadata includes, but is not limited to, an instantaneous magnification scale and instantaneous time scale. Such metadata about the live streams and recordings from digital microscopes would be desirable to have including for machine learning and computer vision algorithms.

“RDF diffs” [1], or “RDF deltas” [1], could be utilized with WebVTT for representing timed changes to semantic graphs and such approaches could be useful for representing extensible and dynamic metadata about live steams and recordings from biomedical, scientific, and industrial sensors and devices.

Best regards,
Adam Sobieski


See also

Received on Sunday, 27 June 2021 00:33:53 UTC