HTML5 Video and Audio Element Name Collisions with SMIL and Z39.86

Assume that at a later time, we have begun thinking, as Silvia said, "beyond
simply making audio and video accessible" in HTML5 and imagine that the SMIL
recommendation is deemed by some future consensus to be an appropriate model
for incorporating richer (and accessible) synchronized multimedia content
into HTML.

However, hasn't the introduction of a specification and content model for
the HTML5 video and audio elements that is in conflict with the existing W3C
recommendation for SMIL audio and video elements significantly complicating
matters?

Re-inventing aspects of SMIL functionality to produce a simplified
audio/video model for HTML authors isn't a bad thing, but re-using existing
SMIL element names and then redefining their specification, use, and
behavior seems counter to an interoperable Web. The naming collision between
the SMIL and HTML5 audio and video elements can, at a minimum, result in
confusion for authors, educators and implementors.

This type of problem has come up previously. MathML and SVG when included in
HTML5 resulted in some element naming collisions. But in contrast to the
present situation, those were instances in which existing recommendations
were brought together and in which the same element name was already in use.
 In the case of HTML5's audio and video, these are new elements being added
to HTML5, not existing legacy element names, and thus the opportunity to
avoid such collisions exists when choosing new element names in HTML5.

Unfortunately, the Genie is out of the bottle, in so far as the video and
audio elements appear in the current HTML5 draft and are being implemented
now in several UAs. But before the current approach goes too far and ends up
in the HTML5 recommendation, this group needs to think seriously about the
future relationship between HTML5 media elements and SMIL, if in fact there
is serious interest in eventually bringing SMIL (or SMIL-like capabilities)
into use within HTML5.

Now, in terms of translating this into requirements:

SMIL is already widely used in the digital talking book (DTB) community
worldwide, as part of the ANSI NISO Z39.86 specification.  Tools, DTB UAs,
and content have been widely implemented. Commercial and open source
solutions exist. Human resources have been trained in developed and
developing countries in the intricacies of Z39.86, including many software
developers and content authors with disabilities. Efforts are underway to
provide Web-based delivery of DTBs.  Interoperability of DTB content with
standard Web UAs with minimal refactoring would be highly desirable. Full
audio and video DTBs, using Z39.86 are possible and have been prototyped,
offering structured and accessible navigation of movies.

By introducing a bifurcation of the video and audio elements, the ability to
easily include existing accessible content within HTML5 is much harder. The
investment in tools and training is lost, as the SMIL model is not
compatible with the similar but more limited capabilities being designed
into HTML5.

The requirement:

Any addition of synchronized text and audio/video functionality into HTML5
must be compatible and not conflict with existing standards or practice
already in use today.


mark
---
Markku T. Häkkinen
Senior Researcher
Department of Mathematical Information Technology
University of Jyväskylä
http://users.jyu.fi/~mhakkine

Received on Thursday, 6 May 2010 21:52:39 UTC