Re: HTML+SMIL handles text well from Michael A. Dolan on 2002-08-09 (www-tt-tf@w3.org from August 2002)

From: Michael A. Dolan <miked@tbt.com>
Date: Fri, 09 Aug 2002 09:54:31 -0700
To: www-tt-tf@w3.org
Message-Id: <5.1.0.14.2.20020809083546.042ecf18@cts.com>
I am admittedly not a SMIL expert, so if I have made assumptions here about 
SMIL that are not accurate, someone please correct them.

First, it is my understanding that SMIL is intended to be used to construct 
a multimedia *presentation* and to control the timing of that presentation 
within the user agent.  This example provided by Mr. Ramirez is a fine 
representation of what SMIL does well.

But what I thought TT was (and the problem I thought was described in the 
requirements) is a language that allows the definition and authoring of a 
text stream synchronized to some timebase (either internal or 
external).  TT is not a presentation system.  TT is an authoring 
specification to which one could create text with sufficient 
synchronization elements as to sometime later on in time and space, either 
present it alone, or combine it with other related essence elements for 
presentation.  Thus, it is my view that a TT file is an input to a SMIL 
presentation system, not SMIL itself.

One cannot presume that the other related essence elements will be 
distributed along with the TT file.  It is likely in some cases that they 
will not.  If party A creates a video/audio presentation with the video and 
audio authored separately, then party B can create another audio track and 
combine later in time with the original video stream to retarget the 
presentation to another language for example.  The same is needed of TT.  A 
3rd party must be able to author text related to some timeline to be 
combined with the related essence elements at some point in the future, 
possibly even through a separate distribution channel, arriving at the user 
agent separately. This scenario, common in television captioning for 
example, requires that the timeline be embedded in the TT element.

In the case where there is only one author for everything, and everything 
is neatly bundled up into a single package for all time, then SMIL could be 
used for this purpose.  But this is not the general case and I can't see 
how SMIL supports this looser composition of the elements. (Or if it can, 
could someone elaborate?)

Further, SMIL seems to presume that the text essence of the composition is 
inline.  This is like requiring all the image pixels be defined in the SMIL 
syntax rather than being able to refer to an external file.  For example, I 
would have expected to see SMIL syntax of the form:

         <t:img.....
         <t:audio.....
         <t:text.....

Where the "text" element of the composition is, in fact, the TT language 
syntax being contemplated by this group.  Maybe one can construct a series 
of static (HTML) text files using the above, but that is clumsy and 
requires potentially hundreds of separate text files for a modest length 
presentation.  The same problem is true of the images.  SMIL is OK for 
short presentations perhaps, but not 2-hour long ones with a new text and 
image presentation every 4 seconds.  A lengthy presentation (such as 2-3 
hours) using SMIL would require thousands of separate files, and tens of 
thousands of lines of SMIL code.  The former could be fixed by compositing 
text in a single file and using MNG with fragment URI syntax or something I 
suppose, but is a general problem and not specific to text.  And, the 
architectural requirements that there be large amounts of SMIL code to 
perform only synchronization seems problematic.  A 2-3 hours presentation 
seems to require enormous amounts of SMIL code.  Other systems solve this 
with implicit synchronization using the timelines in the elements themselves.

SMIL seems to discourage the use of timelines in the essence files, 
preferring to set the timebase itself.  SMIL 1, as I recall, could not 
handle push video and audio streams for this reason (is this better in 
SMIL2?).  Minimally, it is still not obvious how to composite multiple 
streams each with their own timelines.  In contrast, this is common 
practice in all existing video and audio authoring systems.  That is, given 
a video stream with a timeline and an audio stream with a timeline other 
systems synchronize these implicitly as a matter of course without explicit 
controls for every frame.  The same is needed for TT.  It needs its own 
timeline and the presentation system needs to be able to make sense of it 
relative to the other components.

So in summary, there are several issues:

1. TT needs to be a peer authoring format to video/image/audio and not 
embedded in the presentation language;
2. TT needs its own timeline to allow 3rd party authoring and simpler 
compositing; and
3. A presentation system is presumed that can composite these separate 
elements (which may or may not be SMIL).

Can some of the SMIL XML syntax be re-purposed for defining the TT 
language?  Seems to me that it can.  Is SMIL and its semantics the answer 
to the TT problem?  I sure don't see how.  But perhaps SMIL2 is richer than 
I understand, and given the above discussion, perhaps someone more 
knowledgeable can construct an example using SMIL that meets the needs 
described here and can show how it scales to 3 hours?

Regards,

         Mike

At 07:50 AM 8/9/2002 -0700, Jose Ramirez wrote:

>Hi All,
>
>It's a little to quiet here, this should change that :)
>
>A short piece, demonstrating how well timed text is handled in the 
>HTML+SMIL profile, preloads about 1MB and 1:30.00s long (IE 6 required).
>
>http://www.geocities.com/ramirez_j2001/test3/poem/html_smil_example.html
>
>Hopefully a simple Timed-text profile that could fit well with
>the SMIL 2 profile player could be created.
>
>Features that are quite useful:
>-begin and end attribute
>-fade transition ( as the above example shows, fading the text allow the
>  text to blend with a presentation, otherwise the text would just jump
>  onto the screen and be distraction)
>-transparent background
>-some HTML elements, p, h1 h2..., br
>-text align, left, right, center (like in the above example, the text
>  didn't need to have a exact x y position and align center provided an
>  easy solution)
>-absolute x y positioning
>
>The most important aspect I think is to keep version 1 as basic as 
>possible, so it can be implement soon and finally there can be
>multimedia document made with non-proprietary components.
>
>
>Jose Ramirez
>proprietary = temporary
>
>
>
>
>
>
>
>
>
>
>

-----------------------------------------------------
Michael A. Dolan  TerraByte Technology    (619)445-9070
PO Box 1673 Alpine, CA 91903 USA  FAX: (208)545-6564
URL:http://www.tbt.com
Received on Friday, 9 August 2002 13:02:20 UTC