W3C home > Mailing lists > Public > public-tt@w3.org > August 2003

RE: TT and subtitling/captioning - temporal flow of content

From: <Johnb@screen.subtitling.com>
Date: Mon, 11 Aug 2003 11:16:18 +0100
Message-ID: <11E58A66B922D511AFB600A0244A722E9EE572@NTMAIL>
To: ehodge@real.com
Cc: public-tt@w3.org
Well described, thanks.  I'm curious what happens when the read-interval
plus the add-interval exceed the desired duration of a subtitle, e.g., two
people are talking and the first one says three lines of text very quickly,
in say 3 seconds, followed immediatly by person 2 saying something.  If the
add-interval is .5seconds and the read-interval is 4 seconds, then the
display of subsequent subtitle(s) will (a) fall behind by 1.5 seconds, or
(b) be cut off.

[JB> ] Well, the add-interval and read-interval are intended as desired -
not mandatory timings. So the answer would be the UA would attempt to meet
the timing constraints but if unable to, would reduce the time intervals to
maintain the display of the content - preferably by attempting to maintain
the read-interval at the expense of the add-interval. This is because -
given the nature of what subtitling/captioning is for - it would be
preferable to have a faster display - compared to dropping content. For
video - probably the converse is true - if you get behind you drop frames.

[JB> ] Your example above is a little contradictory - since you are
specifying exact timings - I.e if lip sync is important the file would need
explicit timing for each utterance (or at least the start of each sentence).
These timings would relate to display events (region clears/line shifts etc)
and temporal flow would not be required / appropriate.
As I see it, the real use of a temporal flow with text is for things like
narration - and off screen speakers. For example a nature program often has
a narrative track that runs alongside the video of the 'pretty animals'. It
would be very nice if the text content of that narrative could be provided
as 'bulk' content, with a start narrative - end narrative marking in the
time domain - and the definition of a display region in the style domain.
Temporal-flow would then describe to the UA how to fit the text in over
time. As I see it, this would be useful in other domains as well - e.g.
public information systems.
If you mean by your eample - what would a subtitler do in the described
situation, it is likely that the subtitler would first establish the total
scene duration, then adjust the amount of content (by truncation or
paraphrasing) and or the timing to fit. In a sense that's what I want the UA
to do - or at least the timing aspect.
For those interested, the following reference is a thoroughly comprehensive
article about translation subtitling, much of which is also applicable to
same language subtitling (captioning). Specifically relevant to this
discussion is section 3. Constraints and technical aspects.
John Birch 

The views and opinions expressed are the author's own and do not necessarily

reflect the views and opinions of Screen Subtitling Systems Limited. 

Part of the 'problem' with using TT-AF for subtitling is that there are
existing distribution formats for subtitling / captioning. There are also
existing standards (or at least strong conventions) as to how
captions/subtitles are displayed. In order to be a successful conveyance for
subtitle/caption information, TT-AF must be able to encode the current
author intended display effects - prior to the transfer of the carried
content into a distribution format. I do not believe that the current style
standards proposed for TT have the richness required to do this. It **may**
be possible within the TT-AF to explicitly define all of the timing, and
define multiple style rules in such a manner as to achieve some of the
effects desired - but it will be no improvement over the existing
'proprietary' standards. In fact it is likely to be considerably harder - as
these standards intrinsically support the appropriate display concepts. Two
important considerations are:
1) Subtitles/captions are designed to fit within a fixed size region. It is
truer to say that the subtitle fits the block - rather than the block fits
the content. conceptual difference from most web styling concepts. In
subtitling/captioning - the region can only be of limited dimensions - font
sizes are restricted, etc.
2) The entire process of subtitling/ captioning is the control of temporal
overflow - basically by 
a) reducing the amount of content (word substitution, excision of irrelvant
or superfluous text, etc)  
and b) by spreading the content in the time direction - using an
understanding of the reading speed of the target audience and an
understanding of where it is appropriate to break the text.
There are a number of ways of typically displaying subtitle/captions - I'll
outline some of them descriptively below:
(Please note this is not an exhaustive list)
normal 'pop' modes:
Typically - subtitles are displayed 'in toto' for a reading interval, then a
short no subtitle visible period occurs, then the next subtitle is shown.
The subtitle region will sometimes vary in size to suit the amount of text
displayed - and typically the spacing of the subtitles is kept fairly
regular (it must be remembered this is a human edited process - so there is
a degree of variation - part of the art of good subtitling is maintaining
the reading 'flow'). 
So you might have a mixture of two and three line subtitles (typically more
two line than three), on screen for 4 to 5 seconds, spaced by 0.5 second
line-by-line modes:
The subtitle region may be filled a line at a time - each line added,
possibly with existing content moving to make space, after successive time
intervals, until it reaches a fill limit (e.g. three lines). At this point,
after a reading interval for the last line the region may clear and the
process restarts.
Alternatively - once a region is full, the top line may be removed and all
the content shifts up - then the new line(s) insert underneath (assuming
western writing mode...)
This would continue until a significant pause in the subtitling - when the
subtitle clears.
snake modes
A variation on the line by line ideas - where the added content is in words
or fragments. Typically the snake fills the line, then the region acts as if
in line-by-line mode (i.e. the lines move up).
Fundamentally however, this issue comes down to a couple of questions:
a) Should / will TT-AF support temporal flow - i.e. a relaxed (non-explicit)
mechanism for placing text content into a region over time.
b) If a) is yes - then how is the concept best supported. My personal view
is that what should be developed are a set of attributes/elements that allow
the definition of temporal-overflow. Some candidates might include:
fill-direction - regardless of writing mode - in subtitling/captioning -
regions are filled from different directions depending on where they are on
the screen. E.g a top of screen subtitle will use the uppermost line first -
then the second etc... Conversely a bottom of screen subtitle will use the
lowest line first  - then the bottom two lines etc. This is to minimise the
intrusion of the subtitle into the central picture area. The UA would need a
'hint' in order to decide which direction is appropriate.

fill-mode - basically the size of content used when filling a region - e.g.
all | line | word | fragment.
region-full-clear - is the region cleared when it fills - or does content
shift to make space - and by what extent (none | all | line | word |
add-interval - A desired (target) interval between additions (auto | value)
read-interval - The desired (target read-interval) - how long the last
content must 'hang' to allow reading.
tidemark - A subtle wrinkle - you may wish to nominally have just two line
subtitles - but allow three liners if the amount of content demands it. The
tidemark would define when to typically consider a clear down in pop mode -
but might be overwritten by the content / time demands.
Of course these concepts are not just limited to TT-AF for subtitling /
captioning - but have application in many other areas....
John Birch 

The views and opinions expressed are the author's own and do not necessarily

reflect the views and opinions of Screen Subtitling Systems Limited. 

-----Original Message----- 

From: Erik Hodge [ mailto:ehodge@real.com <mailto:ehodge@real.com> ] 

Sent: 07 August 2003 17:45 

To: Johnb@screen.subtitling.com; glenn@xfsi.com 

Cc: public-tt@w3.org 

Subject: RE: TT and subtitling/captioning - temporal flow of content

3GPP Timed Text uses an overall duration for display of a block of text
along with a scroll-in + delay + scroll-out.  The (optional) scroll-in and
(optional) scroll-out each do not have explicit duration but rather their
durations are calculated using the text's duration minus the delay.  This, I
think, would work for what you need, although it sounds like you'd want
possibly multiple delay periods based on the number of lines of text (tl)
and the number of lines of display (dl).  If there was a total delay time
(d) then the number of delay periods (pd), spread evenly throughout the
total duration, would be tl/dl rounded up to the nearest whole number, and
the delay of each would be pd/d.

        - Erik

At 05:54 PM 8/7/2003 +0100, Johnb@screen.subtitling.com wrote: 


You wrote:

I'm afraid I'm still not following your description. Could you try to put
together a example of what you mean using some of the vocabulary we have
been describing? If you could create some images of how it would look over
time, then I could understand better. 


[JB> ]  Ok - tall order - but I'll try..... 

Starting with a piece of text from which I have deliberately removed the
line breaks etc. Note the time constraint, in-cue before out-cue after. 


Ladies and gentlemen, Ladies and gentlemen! I want to congratulate each and
every one of you for making this one of the greatest years in the history of
the Nakatomi Corporation. On behalf of the Chief Executive Officer, Mr Ozu,
and the Board of Directors, we thank you one and all and wish you a merry
Christmas and a happy New Year! 


Duration of entire section is approximately 25 seconds. 

Now this is a subtitle/caption to be displayed (using Teletext) on a two row
subtitle/caption region. Each Teletext row only holds 37 active characters
in double height white. We can't grow the region. 

So what my ideal UA would do is flow this text into the region according to
certain rules. 

Rule 1 - content should be displayed long enough to be read. 

Implication is that last added content must 'hang' for a period. We should
work backwards from the outcue when determining the interim timings. 

Lets posit a read time of 3 seconds for a two line subtitle. 

From content alone and the encompassing period we can work out. 

Maximum of 37 X 2 characters displayed per refresh of subtitle/caption = 74

Above text is 62 words, 334 characters including spaces (or so MS word tells

so 334 / 62 = 5.5 refreshes of the region to display all the content. We
can't have half a refresh - so 6 unique display occurences of the region. 

25 seconds divided by six gives us approximately 4 seconds per display -
which fits the reading time nicely. 

We probably want control over the mark space ratio (i.e. the on air -
off-air timing for the region) - typically to 'notify' the reder that the
content has changed a small gap is left between displays. 

Ok that roughly covers the temporal flow..... but there are other aspects
concerned with how the content is put into the region. 

The above assumes that the content is all presented simultaneously as a full
region... there are a number of alternative ways of filling a nd clearing
the region throughout the 25 seconds. e.g. 



character by character. 

Further I have assumed that the region is cleared and refilled (pop mode),
but it is equally valid to consider cases where new content displaces
existing content (i.e. pushes it out - push mode). 


John Birch 

The views and opinions expressed are the author's own and do not necessarily

reflect the views and opinions of Screen Subtitling Systems Limited. 


Tackling just the temporal flow issue - I'm still digesting the style
separation feedback..... 

A second question.... 

It would be desirable for TT (at least IMHO) to include mechanisms for
describing the temporal breaking of content. 

What I am thinking of is a document that does not describe explicitly the
timing for all of the content 

- but rather describes that X amount of content fits into a box of size Y
over a time period of Z. 

Now if the content X is too large for box Y - how does the content get
over(?)flowed in a 'temporal sense' through the box.  

I'm not sure I'm following your scenario here. Are you saying you want
individual characters, words, lines, etc. to appear in box Y over time, and
do so without explicitly timing each unit? 

[JB> ] That's exactly it. No explicit timing - but an overall timing. For
example timing is specified for a paragraph of text (multiple lines) to be
'rendered' into a nominally single line region over that time period. 

If so, I can see some possible problems, such as (1) needing to specify the
granularity of content to be timed (i.e., character, word, etc.); (2) which
would entail the need to formally specify how to subdivide content lacking
markup into such units. 

[JB> ] Yes - it would - but this is what I see as part of the essence of
timed text - a description of the behaviour of text over time. 

While this might make the content of a TT-AF file smaller, 

[JB> ] This isn't a size of file issue - rather it's a usability issue. By
being able to specify how you want the user agent to react in situations of
overflow - by spreading the text temporally cf (as well as) the CSS scroll /
marquee concepts, I see the following advantages: 

It allows a faster authoring of content. 

It also potentially allows the creation of style templates that work more
universally for text - they need not be so tied to specific text. 

A user agent that is able to take the role of distributing text over time
would produce more consistent results. 

The translation of one langauge to another need not involve a 'knife and
fork' re-edit of the file contents. 

it would also be possible to do this by animating the visibility property of
individual units explicitly, making decisions about what constitute units at
authoring time, e.g., 

[JB> ] Snip 'knife and fork' explicitly timed example. 

Yes but this example has explicit timing. If the text is modified in length
- you have to modify the timing. Different language (or reading
level)instances of a given text content will differ in length, yet in a
subtitling scenario - and many others I suspect - they will be constrained
to display within the same specific display period that cannot be extended.
Ideally TT-Af would allow the modification (or substitution) of content
without the explicit requirement to adjust the number of, and timing of
multiple cue elements. 
Received on Monday, 11 August 2003 06:05:57 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 5 October 2017 18:23:59 UTC