- From: Michael A. Dolan <miked@tbt.com>
- Date: Fri, 09 Aug 2002 09:54:31 -0700
- To: www-tt-tf@w3.org
I am admittedly not a SMIL expert, so if I have made assumptions here about
SMIL that are not accurate, someone please correct them.
First, it is my understanding that SMIL is intended to be used to construct
a multimedia *presentation* and to control the timing of that presentation
within the user agent. This example provided by Mr. Ramirez is a fine
representation of what SMIL does well.
But what I thought TT was (and the problem I thought was described in the
requirements) is a language that allows the definition and authoring of a
text stream synchronized to some timebase (either internal or
external). TT is not a presentation system. TT is an authoring
specification to which one could create text with sufficient
synchronization elements as to sometime later on in time and space, either
present it alone, or combine it with other related essence elements for
presentation. Thus, it is my view that a TT file is an input to a SMIL
presentation system, not SMIL itself.
One cannot presume that the other related essence elements will be
distributed along with the TT file. It is likely in some cases that they
will not. If party A creates a video/audio presentation with the video and
audio authored separately, then party B can create another audio track and
combine later in time with the original video stream to retarget the
presentation to another language for example. The same is needed of TT. A
3rd party must be able to author text related to some timeline to be
combined with the related essence elements at some point in the future,
possibly even through a separate distribution channel, arriving at the user
agent separately. This scenario, common in television captioning for
example, requires that the timeline be embedded in the TT element.
In the case where there is only one author for everything, and everything
is neatly bundled up into a single package for all time, then SMIL could be
used for this purpose. But this is not the general case and I can't see
how SMIL supports this looser composition of the elements. (Or if it can,
could someone elaborate?)
Further, SMIL seems to presume that the text essence of the composition is
inline. This is like requiring all the image pixels be defined in the SMIL
syntax rather than being able to refer to an external file. For example, I
would have expected to see SMIL syntax of the form:
<t:img.....
<t:audio.....
<t:text.....
Where the "text" element of the composition is, in fact, the TT language
syntax being contemplated by this group. Maybe one can construct a series
of static (HTML) text files using the above, but that is clumsy and
requires potentially hundreds of separate text files for a modest length
presentation. The same problem is true of the images. SMIL is OK for
short presentations perhaps, but not 2-hour long ones with a new text and
image presentation every 4 seconds. A lengthy presentation (such as 2-3
hours) using SMIL would require thousands of separate files, and tens of
thousands of lines of SMIL code. The former could be fixed by compositing
text in a single file and using MNG with fragment URI syntax or something I
suppose, but is a general problem and not specific to text. And, the
architectural requirements that there be large amounts of SMIL code to
perform only synchronization seems problematic. A 2-3 hours presentation
seems to require enormous amounts of SMIL code. Other systems solve this
with implicit synchronization using the timelines in the elements themselves.
SMIL seems to discourage the use of timelines in the essence files,
preferring to set the timebase itself. SMIL 1, as I recall, could not
handle push video and audio streams for this reason (is this better in
SMIL2?). Minimally, it is still not obvious how to composite multiple
streams each with their own timelines. In contrast, this is common
practice in all existing video and audio authoring systems. That is, given
a video stream with a timeline and an audio stream with a timeline other
systems synchronize these implicitly as a matter of course without explicit
controls for every frame. The same is needed for TT. It needs its own
timeline and the presentation system needs to be able to make sense of it
relative to the other components.
So in summary, there are several issues:
1. TT needs to be a peer authoring format to video/image/audio and not
embedded in the presentation language;
2. TT needs its own timeline to allow 3rd party authoring and simpler
compositing; and
3. A presentation system is presumed that can composite these separate
elements (which may or may not be SMIL).
Can some of the SMIL XML syntax be re-purposed for defining the TT
language? Seems to me that it can. Is SMIL and its semantics the answer
to the TT problem? I sure don't see how. But perhaps SMIL2 is richer than
I understand, and given the above discussion, perhaps someone more
knowledgeable can construct an example using SMIL that meets the needs
described here and can show how it scales to 3 hours?
Regards,
Mike
At 07:50 AM 8/9/2002 -0700, Jose Ramirez wrote:
>Hi All,
>
>It's a little to quiet here, this should change that :)
>
>A short piece, demonstrating how well timed text is handled in the
>HTML+SMIL profile, preloads about 1MB and 1:30.00s long (IE 6 required).
>
>http://www.geocities.com/ramirez_j2001/test3/poem/html_smil_example.html
>
>Hopefully a simple Timed-text profile that could fit well with
>the SMIL 2 profile player could be created.
>
>Features that are quite useful:
>-begin and end attribute
>-fade transition ( as the above example shows, fading the text allow the
> text to blend with a presentation, otherwise the text would just jump
> onto the screen and be distraction)
>-transparent background
>-some HTML elements, p, h1 h2..., br
>-text align, left, right, center (like in the above example, the text
> didn't need to have a exact x y position and align center provided an
> easy solution)
>-absolute x y positioning
>
>The most important aspect I think is to keep version 1 as basic as
>possible, so it can be implement soon and finally there can be
>multimedia document made with non-proprietary components.
>
>
>Jose Ramirez
>proprietary = temporary
>
>
>
>
>
>
>
>
>
>
>
-----------------------------------------------------
Michael A. Dolan TerraByte Technology (619)445-9070
PO Box 1673 Alpine, CA 91903 USA FAX: (208)545-6564
URL:http://www.tbt.com
Received on Friday, 9 August 2002 13:02:20 UTC