A Proposal for Hypervideo Management Language (HVML)

From: Ted Williams (tedwi@starband.net)
Date: Wed, Jun 06 2001

  • Next message: Patrik Schnell: "Re: [Fwd: [Moderator Action] MPEG4 enc/dec]"

    From: "Ted Williams" <tedwi@starband.net>
    To: <www-tv@w3.org>
    Cc: <tedwi@starband.net>
    Date: Wed, 6 Jun 2001 09:26:59 -0700
    Message-ID: <HEECJFKLOIEFPAAIBDHECEIFCBAA.tedwi@starband.net>
    Subject: A Proposal for Hypervideo Management Language (HVML)
    
    
    Hello,
    
    Let me start by apologizing to the members of this list, particularly if
    this post is off-topic.  This is going to be a quite lengthy email.  Please
    indulge me while I give some background information, and then discuss my
    proposal.
    
    Background
    
    My name is Ted Williams and I have been an entrepreneurial engineer in the
    television industry in one form or another since 1984.  I have developed
    software for everything from signal processing devices to digital video
    recorders to DVEs to machine control and automation systems.  Presently I am
    running a startup company I founded called Sultan Media Systems (SMS) to
    build very advanced media production and distribution technologies around
    the concepts presented here.
    
    In 1993 I attended a conference on multimedia and took a good look at what
    multimedia systems were doing, and from there extrapolated what was going on
    under the hood.  This also happened to be the time at which the entire
    television industry was abuzz about the “information superhighway” and the
    “500 channel” television networks that were thought to be just around the
    corner.  At the time everybody, including the likes of Bill Gates at al,
    thought that the information infrastructures would be a natural outgrowth of
    cable television.  Of course the Internet took off and the rest is history
    you already know.
    
    What I extrapolated were two important insights: 1) Multimedia systems were
    simply duplicating what television production systems already did, albeit at
    view-time and with a consumer centric user interface.  The act of
    compositing and presenting the different forms of media (video, music,
    graphics, text, etc.) could have used identical processes and plumbings as
    your average professional television production system, downscaled and at a
    somewhat lower quality level.  2) The digital video that these multimedia
    systems and the promised “information superhighway” utilized could be
    processed like any form of data, could be made intelligent, and could be
    interlinked with itself, other “titles” of its kind, or completely
    independent forms of digital information or software.
    
    Then of course the Internet usurped the information superhighway based on
    cable and interest in fully exploiting digital video’s ability to be
    processed, extended and super distributed via digital communications
    infrastructures waned.  However, the time has come to revisit the whole
    thing.  This is due to, in a phrase, broadband Internet.
    
    The Overall Idea
    
    Imagine the television industry of the near future.  Nearly all television
    material is online and available on-demand soon after it is posted, and what
    is still transmitted via terrestrial broadcast from the major networks and
    affiliates still have rich linkages to auxiliary information residing on the
    Internet, private video servers of broadband service providers, and to DVD
    or DVD-ROM based companion information specifically authored or developed to
    serve as an after-market “upgrade” to the on-air broadcast (like electronic
    hypermedia and software versions of the companion books always offered for
    sale on PBS.)  The viewer interfaces are both aesthetically pleasing as well
    as functional, being designed by production artists and being composed of
    “smart” special effects that have a view-time behavior associated with them.
    Imagine a web of temporal links, some intra-production such as links from a
    table of contents to individual stories in a magazine format show, and some
    inter-production linking to related, historical, sponsor supplied, or just
    about any other pertinent information you can think of.  Imagine all of this
    being arranged ad-hoc in a growing, traversable, navigable, Web-like
    structure evolving in much the same way as the World Wide Web, but with the
    emphasis on full-motion video, images, surround sound, and temporal and
    contextual interactivity.  Of course, traditional Internet information can
    be made to fit right in, and the viewer’s system can actually composite that
    information into the on-line or broadcast production seamlessly, just as
    easily as a post-production system keying titles over video.  A television
    industry like this is not as far off as some of those reading this post may
    be thinking.
    
    All of the required technology to make this happen can be culled from the
    television and video game industries.  The temporal management, composting,
    digital effects, and some of the graphics technology can be found inside all
    kinds of professional post-production gear, and the more advanced real-time
    graphics technology with the ability to treat full-motion video as a texture
    can be found in some higher-end 3D graphics accelerators now being sold into
    the PC gaming market as well as game consoles.  Broadband communications
    infrastructures are coming on-line at a rapid pace, and it appears (to me
    anyway) that something like Moor’s Law also applies to bandwidth.  What is
    needed is a “glue” to bind all of this together.  This, finally, is where
    Hypervideo Management Language (HVML) fits into the big picture.
    
    Hypervideo Management Language
    
    For the sake of brevity, I will only gloss over some of the more important
    points of HVML here.  It is my hopes that those interested will jump in and
    we can discuss it in greater depth.
    
    HVML is a stream aware, time driven procedural language intended to augment
    markup languages and HTML derived streaming languages such as SMIL and
    ATVEF.  It is intended be to those languages what Java and JavaScript are to
    HTML.  HVML may both control the clock of the production, or may be
    controlled by the clock, with programmatic execution and flow being driven
    by the containing television production.
    
    HVML may be embedded within a digital television stream, and a subset of the
    language may be embedded into the VBI of a terrestrial broadcast NTSC
    signal.  When embedded in the VBI, a technique known as Cyclic Procedure
    Streaming is used so that “late tuners” are guaranteed to receive any
    programmatic procedures required by the body of the HVML production.
    
    SMPTE / EBU timecode is an intrinsic data type, and may have all of the
    standard mathematic operators applied to it.  It also may be freely
    intermixed in mathematic expressions with other intrinsic data types of the
    language, such as integers and floating point values.  For example 1:30:00 /
    1.25 and 1.25 / 1:30:00 are both valid expressions.
    
    HVML supports the standard program control structures, and adds some more
    that are specifically intended for its use as the means to add intelligence
    to hypervideo productions.  The AT structure controls when in time the
    contained block of code will execute.  The ANIMATE structure allows a block
    of code to iteratively execute during the vertical blanking interval, with
    feedback into the program as to what the current time of the animation loop
    relative to the start of the animation as determined by the production’s
    clock is.
    
    The BRANCH, LOOP, and FORK structures embody the temporal web of the
    hypervideo production, causing a change in time, context, or the navigation
    to a new production.  A branch takes the viewer to a new place in time and
    context, a loop brings the viewer back in time and context, and a fork
    splits playback into multiple streams.  The multiple streams created by FORK
    may run in separate windows, or be re-composited by the view-time system
    back into a single image so seamlessly that the viewer has no idea it has
    happened.  In the case of branching to a new time or production with the
    BRANCH structure, a return disposition may be specified.  There may be no
    return, the clock may return to the next frame after the origination point
    on completion of the link opportunity or other factor, it may return time
    adjusted such that if playback branches from A at 5:00 to B, remains in B
    for 15:00, it will return back to A at time 20:00, and finally may return to
    an absolute time.  These control structures are augmented with mechanisms
    for specifying the persistence of programmatic or intrinsic browser states
    across branch boundaries.
    
    An extensible set of basic viewer interface primitives provide for the
    construction of the viewer interface as well as for the attachment of HVML
    code to events happening on those controls.  These basic controls are
    required to have a generic look-and-feel implemented on the receiving end,
    but are most useful when they are used merely as intelligent containers for
    other elements which may give them a customized look.  Controls may be
    filled with text, bitmaps, moving video, or anything else which supplies an
    on-screen appearance.
    
    In addition to the basic set of primitives, “smart special effects” may be
    used.  These are somewhat standard looking special effects, such as a title
    super, key, DVE move, etc, to which a view-time behavior is attached which
    is in turn expressed in HVML.  This allows for the creation of extremely
    visually complex and dynamic viewer interfaces very easily from the
    perspective of the television post-producer, and places title “look and
     feel” issues into the hands of the production artist.
    
    An HVML production may also be self-editing in nature.  Among other more
    artistic uses, this allows a program to have variable ratings levels or
    dynamically variable detail level or “information bandwidth.”  A production
    may also edit itself based on factors such as programmatic states maintained
    by the browser or delivery device, or by past viewer actions and
    interactions.
    
    The attached scenarios document discusses a small set of examples of what
    all of the above will enable a production artist to deliver to the consumer,
    from the consumer’s point-of-view.
    
    The Proposal
    
    At long last, I come to the point.  I propose that the HVML language be
    fully designed and specified in an open forum.  Further I propose that the
    W3C oversee this forum, and moderate development to maintain an adherence to
    an agreed upon standard.  It is my belief that the time to start is now, and
    that we can take full advantage of the lag time we are going to see in
    widespread broadband deployment such that when broadband is extremely
    common, we will be ready with what I believe is the “killer app” for that
    bandwidth.
    
    Conclusion
    
    I invite anybody and everybody on this list to jump in and share their
    opinions and ideas on all of this.  It will be a complex task to create a
    system such as this, so the more people who beat on it the better.
    
    Thank you all for your time.  I look forward to hearing from you.
    
    Regards,
    Ted Williams
    Founder
    Sultan Media Systems, Inc.