- From: <Johnb@screen.subtitling.com>
- Date: Fri, 8 Aug 2003 10:44:22 +0100
- To: ehodge@real.com
- Cc: public-tt@w3.org
- Message-ID: <11E58A66B922D511AFB600A0244A722E9EE568@NTMAIL>
Erik, Part of the 'problem' with using TT-AF for subtitling is that there are existing distribution formats for subtitling / captioning. There are also existing standards (or at least strong conventions) as to how captions/subtitles are displayed. In order to be a successful conveyance for subtitle/caption information, TT-AF must be able to encode the current author intended display effects - prior to the transfer of the carried content into a distribution format. I do not believe that the current style standards proposed for TT have the richness required to do this. It **may** be possible within the TT-AF to explicitly define all of the timing, and define multiple style rules in such a manner as to achieve some of the effects desired - but it will be no improvement over the existing 'proprietary' standards. In fact it is likely to be considerably harder - as these standards intrinsically support the appropriate display concepts. Two important considerations are: 1) Subtitles/captions are designed to fit within a fixed size region. It is truer to say that the subtitle fits the block - rather than the block fits the content. conceptual difference from most web styling concepts. In subtitling/captioning - the region can only be of limited dimensions - font sizes are restricted, etc. 2) The entire process of subtitling/ captioning is the control of temporal overflow - basically by a) reducing the amount of content (word substitution, excision of irrelvant or superfluous text, etc) and b) by spreading the content in the time direction - using an understanding of the reading speed of the target audience and an understanding of where it is appropriate to break the text. There are a number of ways of typically displaying subtitle/captions - I'll outline some of them descriptively below: (Please note this is not an exhaustive list) normal 'pop' modes: Typically - subtitles are displayed 'in toto' for a reading interval, then a short no subtitle visible period occurs, then the next subtitle is shown. The subtitle region will sometimes vary in size to suit the amount of text displayed - and typically the spacing of the subtitles is kept fairly regular (it must be remembered this is a human edited process - so there is a degree of variation - part of the art of good subtitling is maintaining the reading 'flow'). So you might have a mixture of two and three line subtitles (typically more two line than three), on screen for 4 to 5 seconds, spaced by 0.5 second intervals. line-by-line modes: The subtitle region may be filled a line at a time - each line added, possibly with existing content moving to make space, after successive time intervals, until it reaches a fill limit (e.g. three lines). At this point, after a reading interval for the last line the region may clear and the process restarts. Alternatively - once a region is full, the top line may be removed and all the content shifts up - then the new line(s) insert underneath (assuming western writing mode...) This would continue until a significant pause in the subtitling - when the subtitle clears. snake modes A variation on the line by line ideas - where the added content is in words or fragments. Typically the snake fills the line, then the region acts as if in line-by-line mode (i.e. the lines move up). ===================================================== Fundamentally however, this issue comes down to a couple of questions: a) Should / will TT-AF support temporal flow - i.e. a relaxed (non-explicit) mechanism for placing text content into a region over time. b) If a) is yes - then how is the concept best supported. My personal view is that what should be developed are a set of attributes/elements that allow the definition of temporal-overflow. Some candidates might include: fill-direction - regardless of writing mode - in subtitling/captioning - regions are filled from different directions depending on where they are on the screen. E.g a top of screen subtitle will use the uppermost line first - then the second etc... Conversely a bottom of screen subtitle will use the lowest line first - then the bottom two lines etc. This is to minimise the intrusion of the subtitle into the central picture area. The UA would need a 'hint' in order to decide which direction is appropriate. fill-mode - basically the size of content used when filling a region - e.g. all | line | word | fragment. region-full-clear - is the region cleared when it fills - or does content shift to make space - and by what extent (none | all | line | word | fragment) add-interval - A desired (target) interval between additions (auto | value) read-interval - The desired (target read-interval) - how long the last content must 'hang' to allow reading. tidemark - A subtle wrinkle - you may wish to nominally have just two line subtitles - but allow three liners if the amount of content demands it. The tidemark would define when to typically consider a clear down in pop mode - but might be overwritten by the content / time demands. Of course these concepts are not just limited to TT-AF for subtitling / captioning - but have application in many other areas.... regards John Birch The views and opinions expressed are the author's own and do not necessarily reflect the views and opinions of Screen Subtitling Systems Limited. -----Original Message----- From: Erik Hodge [mailto:ehodge@real.com] Sent: 07 August 2003 17:45 To: Johnb@screen.subtitling.com; glenn@xfsi.com Cc: public-tt@w3.org Subject: RE: TT and subtitling/captioning - temporal flow of content 3GPP Timed Text uses an overall duration for display of a block of text along with a scroll-in + delay + scroll-out. The (optional) scroll-in and (optional) scroll-out each do not have explicit duration but rather their durations are calculated using the text's duration minus the delay. This, I think, would work for what you need, although it sounds like you'd want possibly multiple delay periods based on the number of lines of text (tl) and the number of lines of display (dl). If there was a total delay time (d) then the number of delay periods (pd), spread evenly throughout the total duration, would be tl/dl rounded up to the nearest whole number, and the delay of each would be pd/d. - Erik At 05:54 PM 8/7/2003 +0100, Johnb@screen.subtitling.com wrote: Glenn, You wrote: I'm afraid I'm still not following your description. Could you try to put together a example of what you mean using some of the vocabulary we have been describing? If you could create some images of how it would look over time, then I could understand better. G. [JB> ] Ok - tall order - but I'll try..... Starting with a piece of text from which I have deliberately removed the line breaks etc. Note the time constraint, in-cue before out-cue after. 00:02:43.70 Ladies and gentlemen, Ladies and gentlemen! I want to congratulate each and every one of you for making this one of the greatest years in the history of the Nakatomi Corporation. On behalf of the Chief Executive Officer, Mr Ozu, and the Board of Directors, we thank you one and all and wish you a merry Christmas and a happy New Year! 00:03:08.63 Duration of entire section is approximately 25 seconds. Now this is a subtitle/caption to be displayed (using Teletext) on a two row subtitle/caption region. Each Teletext row only holds 37 active characters in double height white. We can't grow the region. So what my ideal UA would do is flow this text into the region according to certain rules. Rule 1 - content should be displayed long enough to be read. Implication is that last added content must 'hang' for a period. We should work backwards from the outcue when determining the interim timings. Lets posit a read time of 3 seconds for a two line subtitle. From content alone and the encompassing period we can work out. Maximum of 37 X 2 characters displayed per refresh of subtitle/caption = 74 characters. Above text is 62 words, 334 characters including spaces (or so MS word tells me) so 334 / 62 = 5.5 refreshes of the region to display all the content. We can't have half a refresh - so 6 unique display occurences of the region. 25 seconds divided by six gives us approximately 4 seconds per display - which fits the reading time nicely. We probably want control over the mark space ratio (i.e. the on air - off-air timing for the region) - typically to 'notify' the reder that the content has changed a small gap is left between displays. Ok that roughly covers the temporal flow..... but there are other aspects concerned with how the content is put into the region. The above assumes that the content is all presented simultaneously as a full region... there are a number of alternative ways of filling a nd clearing the region throughout the 25 seconds. e.g. line-by-line. word-by-word. character by character. Further I have assumed that the region is cleared and refilled (pop mode), but it is equally valid to consider cases where new content displaces existing content (i.e. pushes it out - push mode). regards John Birch The views and opinions expressed are the author's own and do not necessarily reflect the views and opinions of Screen Subtitling Systems Limited. Glenn, Tackling just the temporal flow issue - I'm still digesting the style separation feedback..... A second question.... It would be desirable for TT (at least IMHO) to include mechanisms for describing the temporal breaking of content. What I am thinking of is a document that does not describe explicitly the timing for all of the content - but rather describes that X amount of content fits into a box of size Y over a time period of Z. Now if the content X is too large for box Y - how does the content get over(?)flowed in a 'temporal sense' through the box. I'm not sure I'm following your scenario here. Are you saying you want individual characters, words, lines, etc. to appear in box Y over time, and do so without explicitly timing each unit? [JB> ] That's exactly it. No explicit timing - but an overall timing. For example timing is specified for a paragraph of text (multiple lines) to be 'rendered' into a nominally single line region over that time period. If so, I can see some possible problems, such as (1) needing to specify the granularity of content to be timed (i.e., character, word, etc.); (2) which would entail the need to formally specify how to subdivide content lacking markup into such units. [JB> ] Yes - it would - but this is what I see as part of the essence of timed text - a description of the behaviour of text over time. While this might make the content of a TT-AF file smaller, [JB> ] This isn't a size of file issue - rather it's a usability issue. By being able to specify how you want the user agent to react in situations of overflow - by spreading the text temporally cf (as well as) the CSS scroll / marquee concepts, I see the following advantages: It allows a faster authoring of content. It also potentially allows the creation of style templates that work more universally for text - they need not be so tied to specific text. A user agent that is able to take the role of distributing text over time would produce more consistent results. The translation of one langauge to another need not involve a 'knife and fork' re-edit of the file contents. it would also be possible to do this by animating the visibility property of individual units explicitly, making decisions about what constitute units at authoring time, e.g., [JB> ] Snip 'knife and fork' explicitly timed example. Yes but this example has explicit timing. If the text is modified in length - you have to modify the timing. Different language (or reading level)instances of a given text content will differ in length, yet in a subtitling scenario - and many others I suspect - they will be constrained to display within the same specific display period that cannot be extended. Ideally TT-Af would allow the modification (or substitution) of content without the explicit requirement to adjust the number of, and timing of multiple cue elements.
Received on Friday, 8 August 2003 05:40:30 UTC