RE: Why use time as a unit of measurement? (was: Proposal 0.0) from Johnb@screen.subtitling.com on 2003-02-20 (public-tt@w3.org from February 2003)

From: <Johnb@screen.subtitling.com>
Date: Thu, 20 Feb 2003 10:58:54 -0000
To: jean-claude.dufourd@enst.fr, public-tt@w3.org
Cc: singer@apple.com
Message-ID: <11E58A66B922D511AFB600A0244A722E093EEB@NTMAIL>
Jean-Claude Dufourd wrote:

> Going back to a conceptual level, John Birch's requirements are:
> 1- a movie constituted of a video stream and an audio stream and a 
> subtitles stream (actually, possibly many audio and 
> subtitles), should be playable in sync, whatever part is played in 
> whatever sequence
> 2- a movie should be playable according to an edit list

These two are actually restatements of the same root requirement, are they
not?
That the audio, video and other components of a 'movie' (multimedia
presentation)
each contain sufficient information to be capable of synchronised playback.
In broadcast the 'syncMaster' is the timecode stream that exists in a 1 to 1

relationship with the video stream. 
Edits to the video will always affect the timecode stream as well (they
exist on the same media).
All other streams MUST follow that stream.
 
> 1 seems a TT requirement, whereas 2 does not. 2 is more of a 
> requirement on the player. Right ?

The requirement (for me) on TT is that it contains information that allows
it to be
synchronised with an external 'syncMaster'.

> If that is so, then considering 1, I prefer putting the synchronization 
> in a file defining the composition of streams, rather than having it 
> specified in the subtitles stream. So I'd vote for the SMIL2.0-like
> solution (with adjusted/clarified semantics if needed)

Absolutely. SMIL is IMHO a valid direction to go in, but currently IMHO
suffers
from 'tunnel vision'. SMIL approaches these issues from the perspective of 
'how to co-ordinate the **presentation** of multimedia streams'. It does not
appear 
that preserving the synchronisation relationships between streams throughout
an
editing or layup process was an **initial** consideration, but it seems that
additions in 
SMIL2.0 **may** allow this to be achieved. I certainly believe clarification
is
required.

I find myself leaning towards a view that TT is more of a 'profile' 
(if that is the correct term) describing how to use XML, CSS and SMIL for
TT.

> Now, just a word about playing a movie according to an edit list. I 
> question the relevance of requirement 2.

Requirement 2 is what creates requirement 1. The process of editing AVT
material, 
a cycle of creation, revision and review, means that a simple manner of
preserving the sync relationship between streams is desirable. This is the
root of my
dislike of relative from start 'begin', it is unwieldy in the editing
process. 
I can live with duration as it has associated implications 
- for text streams - ensuring readability.
- for video description and dubbing vocal tracks - it makes no sense to cut
short a description.

> Given that all videos encodings I know use I (key or 
> intra-coded) frames 
> and non-I (frames you cannot start decoding at, you have to 
> go back to 
> the previous I frame), I have doubts about the feasibility, 
> with current 
> machines, of playing a stream according to an edit list that is not 
> aligned with I frames. Since cuts would statistically not be aligned 
> with I frames, a new cut set would require partial reencoding of the 
> video. So the automatic adjustment of the subtitles stream seems 
> reasonable. The same adjustment may be needed for the audio streams.

In the broadcast environment - the majority of audio/video is still stored
and 
manipulated in an uncompressed format - allowing edits to occur at any frame
boundary.
FYI The problems associated with compressed streams and subtitles originate
from the
requirement to pre-send subtitles (due to bandwidth limitations). Without a
priori knowledge
of when an edit is going to occur this can (and sometimes does) cause
artifacts in the resultant
presentation. Be aware that in many circumstances in broadcast and edit list
is not available, examples
are Newsflashes, Advert insertion and local censorship. Whilst captions are
generally pre-burnt 
into the material - and thus are intrinsically synchronised, subtitles - for
language translation -
are typically inserted over an incoming broadcast from another region - and
must follow that
incoming broadcast.

regards 
John Birch

The views and opinions expressed are the author's own and do not necessarily
reflect the views and opinions of Screen Subtitling Systems Limited.
Received on Thursday, 20 February 2003 05:49:27 UTC