RE: Interactive Television

Scott Bradley Wilson, Upon reading the WebVTT specification, it seems that even more HTML5 tags are possible in video transcripts.  In particular, hyperlinks and multimedia object references.  The computer as teleprompter with hypertext idea, illustrated in point 3, indicates how hypertext might come to be in a video blog transcript.  Other possible techniques include speech recognition and video post-production software. In addition to text formats, XML and hypertext varieties of video tracks are of interest.  Web development scenarios include combinations of video tracks with DHTML techniques.  With something like <video ...><track kind="metadata" type="xml/temporal" onplayheadenter="callback1" onplayheadexit="callback2" src="file.xml"/></video>, multiple XML tracks can be synchronized to the playhead.  The JavaScript event object can include XML fragments in the data structure.   Kind regards, Adam Sobieski  Subject: Re: Interactive Television
Date: Mon, 8 Aug 2011 15:53:52 +0100

On 7 Aug 2011, at 21:39, Adam Sobieski wrote:1. Hypertext Transcripts of Video

Previously, transcripts of videos have been textual. Under discussion herein are video production and video file format details such that an HTML5 hypertext transcript can be obtained from a video by a video processor, called herein ahypertext transcript processor. Such a software can potentially make use of audio and natural language processing techniques observable in technologies such as MAVIS. Some key topics herein include production, post-production and video file format details such that resulting hypertext transcripts can include both hyperlinks to arbitrary web content and video sections for any referenced media fragments or audio or video clips. Topics in web video such that the video transcripts make fuller use of hypertext.

In HTML5, <link rel="alternate" type="video" href="example.avi"/> could indicate that a file is a hypertext transcript of a video. The means by which a video can reference a hypertext version or its transcript(s) is topical. Of note is that, in the HTML5 video syntax, there exist track elements, which seem robust enough for many uses including transcripts, outlines, captions, DHTML scenarios and perhaps XML and hypertext transcripts.

How might this relate to web-vtt? [1]

2. Selections of Multimedia and Context Menus

A new development exists with regard to audio and video players, including in web browsers, so that intervals of content can be selected. A selection can be one or more intervals of content. One possible means of providing that capability is to make use of the horizontal bar upon which the playhead moves as the audio or video plays. After users make selections, selected audio or video can have, just as text does, a context menu. That context menu can then be extensible with existing techniques for context menus on graphical objects in the desktop environment.

Example menu options include commenting upon, criticizing, blogging about, posting to a social network website, and otherwise reacting to or expressing one's opinion about a portion of audio or video content. Numerous benefits to the end users can be delivered to them when selections of audio and video can occur for extensible context menus. Fair use ( describes what the end users are doing when they make use of such context menus. When end users use clips of audio and video multimedia objects to communicate with others, as per reactions, that is fair use. In my opinion, these ergonomics concepts continue forward with the post-broadcast themes of the web, from RFC 1 to web 2.0, and provide for better user experiences.

3. Computer as Teleprompter for Video Blogs, Hypertext-like Content and Post-Production

Additionally, video bloggers can make use of their computers as teleprompters for their content which can include hypertext such as <a/> and <video/>, scholarly and scientific references and so forth, where the hypertext content allows for automated post-production. With some of the new NUI sensors, software can utilize 3D video processing to rotate the video as per a virtual camera at the midpoint of the teleprompter computer screen. In the computer teleprompter scenario, video bloggers' written content can be uploaded alongside their video data which can simplify hypertext transcript processing.

For scenarios without an accompanying hypertext file, structured video content is interesting, or video file format particulars such that a hypertext transcript processor could obtain, in addition to section and paragraph structure obtainable from natural language processing, XML or hypertext structure including <a/> and <video/>. Such web-like functionality would also be in the indicated video content. The authoring of hypertext and video simultaneously is envisioned as highly convenient to web users as is their ability to make selections of multimedia content and to have extensible context menus for those selections.

4. Combination of Previous Points, Narrative of Video Blogging

An illustrative narrative is one where a video blogger wants to make a video blog that includes clips from other video blogs, websites such as, or any other multimedia content. The video blogger watches multimedia content in any media player that includes the aforementioned functionality of selection and context menu, for example in their web browser. The user make a selection of a region of interest and then, in their context menu, there is an option to import that selected multimedia into a multi-track video production application as a video track object. The media fragment URI of that selection can accompany that data into the video production application as per the user's settings. Then, the user records their video blog response, before and after the clip. The user may be making use of their written hypertext content with their computer as a teleprompter, or other production and post-production techniques such that corresponding video transcripts would make fuller use of hypertext.

Continuing on with the video blogger narrative, if a server-side hypertext transcript processor is in the specific scenario, then it can process their video to then present them with a WYSIWYG text editor web application to refine, as needed, the text portions of the hypertext content obtained by audio and natural language processing, to adjust page layout details, and to make use of other advanced functionality. Regardless of their choice to write first and then make use of the computer as teleprompter option, which is perhaps advantageous for longer compositions (such as this one and other scholarly and scientific publications), the end result is a side-by-side hypertext and video pair of article objects interconnected to one another and to other web content and multimedia.

5. Conclusion

Web user authoring of video content that links to, references or includes clips from multimedia such as video blogs, journalism and punditry, arts and entertainment, or any other multimedia content whatsoever, is an exciting usage scenario for web technologies such as media fragments URI's. Robust and CSS3 themeable hypertext articles can be uploaded alongside videos or obtained by hypertext transcript processors to convenience the end users if they should desire side-by-side hypertext and video content for their blog articles.

The indicated new media playback ergonomics provide web users the means to make selections of content with extensible context menus to link to, comment to, or otherwise respond to content online. Other indicated techniques allow for a side-by-side hypertext and video blogosphere in ways convenient to the web users when either authoring and experiencing web content.

Kind regards,

Adam Sobieski


Received on Tuesday, 9 August 2011 13:22:10 UTC