RE: TT and subtitling/captioning - temporal flow of content from Erik Hodge on 2003-08-08 (public-tt@w3.org from August 2003)

From: Erik Hodge <ehodge@real.com>
Date: Fri, 08 Aug 2003 10:05:47 -0700
To: Johnb@screen.subtitling.com
Cc: public-tt@w3.org
Message-Id: <5.1.0.14.2.20030808100113.01990b90@mailone.real.com>
Well described, thanks.  I'm curious what happens when the read-interval 
plus the add-interval exceed the desired duration of a subtitle, e.g., two 
people are talking and the first one says three lines of text very quickly, 
in say 3 seconds, followed immediatly by person 2 saying something.  If the 
add-interval is .5seconds and the read-interval is 4 seconds, then the 
display of subsequent subtitle(s) will (a) fall behind by 1.5 seconds, or 
(b) be cut off.

Thanks,

         - Erik

At 10:44 AM 8/8/2003 +0100, Johnb@screen.subtitling.com wrote:
>Erik,
>
>Part of the 'problem' with using TT-AF for subtitling is that there are 
>existing distribution formats for subtitling / captioning. There are also 
>existing standards (or at least strong conventions) as to how 
>captions/subtitles are displayed. In order to be a successful conveyance 
>for subtitle/caption information, TT-AF must be able to encode the current 
>author intended display effects - prior to the transfer of the carried 
>content into a distribution format. I do not believe that the current 
>style standards proposed for TT have the richness required to do this. It 
>**may** be possible within the TT-AF to explicitly define all of the 
>timing, and define multiple style rules in such a manner as to achieve 
>some of the effects desired - but it will be no improvement over the 
>existing 'proprietary' standards. In fact it is likely to be considerably 
>harder - as these standards intrinsically support the appropriate display 
>concepts. Two important considerations are:
>
>1) Subtitles/captions are designed to fit within a fixed size region. It 
>is truer to say that the subtitle fits the block - rather than the block 
>fits the content. conceptual difference from most web styling concepts. In 
>subtitling/captioning - the region can only be of limited dimensions - 
>font sizes are restricted, etc.
>
>2) The entire process of subtitling/ captioning is the control of temporal 
>overflow - basically by
>a) reducing the amount of content (word substitution, excision of 
>irrelvant or superfluous text, etc)
>and b) by spreading the content in the time direction - using an 
>understanding of the reading speed of the target audience and an 
>understanding of where it is appropriate to break the text.
>
>There are a number of ways of typically displaying subtitle/captions - 
>I'll outline some of them descriptively below:
>(Please note this is not an exhaustive list)
>
>normal 'pop' modes:
>
>Typically - subtitles are displayed 'in toto' for a reading interval, then 
>a short no subtitle visible period occurs, then the next subtitle is 
>shown. The subtitle region will sometimes vary in size to suit the amount 
>of text displayed - and typically the spacing of the subtitles is kept 
>fairly regular (it must be remembered this is a human edited process - so 
>there is a degree of variation - part of the art of good subtitling is 
>maintaining the reading 'flow').
>
>So you might have a mixture of two and three line subtitles (typically 
>more two line than three), on screen for 4 to 5 seconds, spaced by 0.5 
>second intervals.
>
>line-by-line modes:
>The subtitle region may be filled a line at a time - each line added, 
>possibly with existing content moving to make space, after successive time 
>intervals, until it reaches a fill limit (e.g. three lines). At this 
>point, after a reading interval for the last line the region may clear and 
>the process restarts.
>
>Alternatively - once a region is full, the top line may be removed and all 
>the content shifts up - then the new line(s) insert underneath (assuming 
>western writing mode...)
>This would continue until a significant pause in the subtitling - when the 
>subtitle clears.
>
>snake modes
>A variation on the line by line ideas - where the added content is in 
>words or fragments. Typically the snake fills the line, then the region 
>acts as if in line-by-line mode (i.e. the lines move up).
>
>
>=====================================================
>
>Fundamentally however, this issue comes down to a couple of questions:
>
>a) Should / will TT-AF support temporal flow - i.e. a relaxed 
>(non-explicit) mechanism for placing text content into a region over time.
>b) If a) is yes - then how is the concept best supported. My personal view 
>is that what should be developed are a set of attributes/elements that 
>allow the definition of temporal-overflow. Some candidates might include:
>
>fill-direction - regardless of writing mode - in subtitling/captioning - 
>regions are filled from different directions depending on where they are 
>on the screen. E.g a top of screen subtitle will use the uppermost line 
>first - then the second etc... Conversely a bottom of screen subtitle will 
>use the lowest line first  - then the bottom two lines etc. This is to 
>minimise the intrusion of the subtitle into the central picture area. The 
>UA would need a 'hint' in order to decide which direction is appropriate.
>
>fill-mode - basically the size of content used when filling a region - 
>e.g. all | line | word | fragment.
>region-full-clear - is the region cleared when it fills - or does content 
>shift to make space - and by what extent (none | all | line | word | fragment)
>add-interval - A desired (target) interval between additions (auto | value)
>read-interval - The desired (target read-interval) - how long the last 
>content must 'hang' to allow reading.
>tidemark - A subtle wrinkle - you may wish to nominally have just two line 
>subtitles - but allow three liners if the amount of content demands it. 
>The tidemark would define when to typically consider a clear down in pop 
>mode - but might be overwritten by the content / time demands.
>Of course these concepts are not just limited to TT-AF for subtitling / 
>captioning - but have application in many other areas....
>
>regards
>John Birch
>
>The views and opinions expressed are the author's own and do not necessarily
>reflect the views and opinions of Screen Subtitling Systems Limited.
>-----Original Message-----
>From: Erik Hodge [mailto:ehodge@real.com]
>Sent: 07 August 2003 17:45
>To: Johnb@screen.subtitling.com; glenn@xfsi.com
>Cc: public-tt@w3.org
>Subject: RE: TT and subtitling/captioning - temporal flow of content
>
>3GPP Timed Text uses an overall duration for display of a block of text 
>along with a scroll-in + delay + scroll-out.  The (optional) scroll-in and 
>(optional) scroll-out each do not have explicit duration but rather their 
>durations are calculated using the text's duration minus the delay.  This, 
>I think, would work for what you need, although it sounds like you'd want 
>possibly multiple delay periods based on the number of lines of text (tl) 
>and the number of lines of display (dl).  If there was a total delay time 
>(d) then the number of delay periods (pd), spread evenly throughout the 
>total duration, would be tl/dl rounded up to the nearest whole number, and 
>the delay of each would be pd/d.
>
>         - Erik
>
>At 05:54 PM 8/7/2003 +0100, Johnb@screen.subtitling.com wrote:
>>Glenn,
>>
>>You wrote:
>>
>>I'm afraid I'm still not following your description. Could you try to put 
>>together a example of what you mean using some of the vocabulary we have 
>>been describing? If you could create some images of how it would look 
>>over time, then I could understand better.
>>G.
>>[JB> ]  Ok - tall order - but I'll try.....
>>
>>Starting with a piece of text from which I have deliberately removed the 
>>line breaks etc. Note the time constraint, in-cue before out-cue after.
>>
>>00:02:43.70
>>Ladies and gentlemen, Ladies and gentlemen! I want to congratulate each 
>>and every one of you for making this one of the greatest years in the 
>>history of the Nakatomi Corporation. On behalf of the Chief Executive 
>>Officer, Mr Ozu, and the Board of Directors, we thank you one and all and 
>>wish you a merry Christmas and a happy New Year!
>>00:03:08.63
>>
>>Duration of entire section is approximately 25 seconds.
>>
>>Now this is a subtitle/caption to be displayed (using Teletext) on a two 
>>row subtitle/caption region. Each Teletext row only holds 37 active 
>>characters in double height white. We can't grow the region.
>>
>>So what my ideal UA would do is flow this text into the region according 
>>to certain rules.
>>
>>Rule 1 - content should be displayed long enough to be read.
>>
>>Implication is that last added content must 'hang' for a period. We 
>>should work backwards from the outcue when determining the interim timings.
>>
>>Lets posit a read time of 3 seconds for a two line subtitle.
>>
>> From content alone and the encompassing period we can work out.
>>
>>Maximum of 37 X 2 characters displayed per refresh of subtitle/caption = 
>>74 characters.
>>
>>Above text is 62 words, 334 characters including spaces (or so MS word 
>>tells me)
>>
>>so 334 / 62 = 5.5 refreshes of the region to display all the content. We 
>>can't have half a refresh - so 6 unique display occurences of the region.
>>
>>25 seconds divided by six gives us approximately 4 seconds per display - 
>>which fits the reading time nicely.
>>
>>We probably want control over the mark space ratio (i.e. the on air - 
>>off-air timing for the region) - typically to 'notify' the reder that the 
>>content has changed a small gap is left between displays.
>>
>>Ok that roughly covers the temporal flow..... but there are other aspects 
>>concerned with how the content is put into the region.
>>
>>The above assumes that the content is all presented simultaneously as a 
>>full region... there are a number of alternative ways of filling a nd 
>>clearing the region throughout the 25 seconds. e.g.
>>
>>line-by-line.
>>word-by-word.
>>character by character.
>>
>>Further I have assumed that the region is cleared and refilled (pop 
>>mode), but it is equally valid to consider cases where new content 
>>displaces existing content (i.e. pushes it out - push mode).
>>
>>regards
>>John Birch
>>
>>The views and opinions expressed are the author's own and do not 
>>necessarily
>>reflect the views and opinions of Screen Subtitling Systems Limited.
>>Glenn,
>>Tackling just the temporal flow issue - I'm still digesting the style 
>>separation feedback.....
>>A second question....
>>It would be desirable for TT (at least IMHO) to include mechanisms for 
>>describing the temporal breaking of content.
>>What I am thinking of is a document that does not describe explicitly the 
>>timing for all of the content
>>- but rather describes that X amount of content fits into a box of size Y 
>>over a time period of Z.
>>Now if the content X is too large for box Y - how does the content get 
>>over(?)flowed in a 'temporal sense' through the box.
>>I'm not sure I'm following your scenario here. Are you saying you want 
>>individual characters, words, lines, etc. to appear in box Y over time, 
>>and do so without explicitly timing each unit?
>>[JB> ] That's exactly it. No explicit timing - but an overall timing. For 
>>example timing is specified for a paragraph of text (multiple lines) to 
>>be 'rendered' into a nominally single line region over that time period.
>>If so, I can see some possible problems, such as (1) needing to specify 
>>the granularity of content to be timed (i.e., character, word, etc.); (2) 
>>which would entail the need to formally specify how to subdivide content 
>>lacking markup into such units.
>>[JB> ] Yes - it would - but this is what I see as part of the essence of 
>>timed text - a description of the behaviour of text over time.
>>While this might make the content of a TT-AF file smaller,
>>[JB> ] This isn't a size of file issue - rather it's a usability issue. 
>>By being able to specify how you want the user agent to react in 
>>situations of overflow - by spreading the text temporally cf (as well as) 
>>the CSS scroll / marquee concepts, I see the following advantages:
>>It allows a faster authoring of content.
>>It also potentially allows the creation of style templates that work more 
>>universally for text - they need not be so tied to specific text.
>>A user agent that is able to take the role of distributing text over time 
>>would produce more consistent results.
>>The translation of one langauge to another need not involve a 'knife and 
>>fork' re-edit of the file contents.
>>it would also be possible to do this by animating the visibility property 
>>of individual units explicitly, making decisions about what constitute 
>>units at authoring time, e.g.,
>>[JB> ] Snip 'knife and fork' explicitly timed example.
>>Yes but this example has explicit timing. If the text is modified in 
>>length - you have to modify the timing. Different language (or reading 
>>level)instances of a given text content will differ in length, yet in a 
>>subtitling scenario - and many others I suspect - they will be 
>>constrained to display within the same specific display period that 
>>cannot be extended. Ideally TT-Af would allow the modification (or 
>>substitution) of content without the explicit requirement to adjust the 
>>number of, and timing of multiple cue elements.
Received on Friday, 8 August 2003 13:01:52 UTC