RE: [tt-af-1-0-req] Some (late) comments on the requirements from Johnb@screen.subtitling.com on 2004-01-20 (public-tt@w3.org from January 2004)

From: <Johnb@screen.subtitling.com>
Date: Tue, 20 Jan 2004 09:55:37 -0000
To: luke-jr@cox.net
Cc: public-tt@w3.org
Message-ID: <11E58A66B922D511AFB600A0244A722E9EE6F2@NTMAIL>
Hi Luke,

Sorry - I should have made it clear that I work for a broadcast subtitle
equipment company
and that my comments were intended to represent the **basic** requirements
for captions/subtitles.
Are you from a 'fansub' background?

Comments inline.

> On Monday 19 January 2004 02:56 pm, Johnb@screen.subtitling.com wrote:
> > Text content.
> >
> > Timing accurate to frame/field. (including synchronisation 
> to video frames)
> > Note this is not the same as duration or offset from start - because
> > captioned
> > video material may be discontinuous (e.g. Ad breaks)

> In this case, the video file that goes with the timed text will also be 
> paused. Nothing says the video/TT player cannot in reality have 3 minutes
or 
> so at a certain point.
> Not sure, but I think that SMIL may be what you want to use 
> for stuff like this.

No... I definitely don't want to use SMIL for this.
SMIL does not work well for what I want to use TT AF for.

Basically you say "the video file that goes with the timed text will also be

paused". In a broadcast environment this is not possible. You CANNOT stop
the video..... E.g. It is coming off a server, through an MPEG encoder and
up to a satellite.

You just sit on the wire and watch the timecode incrementing.....
An automation system tells you what the current program is.
When you see a timecode that matches a subtitle in your file,
then you insert it onto the wire in the appropriate format.
If the timecode jumps to one outside the program (by convention
programs are timecoded starting at 10:00:00:00 (HH:MM:SS:FF) and
adverts are timecoded from 00:00:00:00), then you just go dumb
until the program comes back.

> > Basic colours (for text - 16 colour model is sufficient)
> 16 colours per dialogue may be sufficient, but overall it is 
> unlikely to be enough. Usually when I subtitle something, I use different 
> colour schemes to represent who is speaking. Timed text (not just
subtitles) 
> also would involve displaying, for example, the title of a movie or
similar 
> which could very well need complex effects including colour fading.

Broadcast subtitling also uses colour for dialogue. E.g UK Teletext.
Teletext is limited to a basic colour set.
Speakers are allocated a colour, but often colours are re-used 
if a character no longer appears in the program.

Extra colours would be usefull for more creative purposes, but
they are not IMHO essential. There is also an issue about how many colours
could be easily distinguished from each other. Certain colours
do not show well on video, and it is more difficult to distinguish
quickly between colours that vary by intensity but not hue.

> > Font selection (Fonts for captions are quite restricted due to
resolution
> > and interlace issues
> CSS, at least, seems to have font classes which could fit a number of 
> different fonts (times, fantasy, etc).

I like the CSS font mechanism, it is focussed on the intention of the
author.

> > Background colour and box styles <SNIP>

> There are also usages where one would only want a combination of an
outline, a 
> shadow, and/or a box. I often use a combination of custom outline and text

> colours to indicate who is speaking.

Broadcast subtitling is very restricted in the features available since it
is
rarely burnt-in. DVD subtitling is much less restricted (though tends to
follow
broadcast conventions).

> > Italics selection (e.g. italics are used to represent lyrics, other
stress
> > or intonation)

> Same for bold, font sizes, under/over/strikeout lines.
Again in broadcast, changing font size on a line is uncommon (in most cases
impossible).
Underline, Overline or strikeout are never (in my experience) used.

> > Typically underlining and blinking are NOT used.
> But should be fairly simple to support, in case they were to 
> ever be desired.

Oh... don't get me wrong, I'm not saying these shouldn't be part of TTAF,
simply that they would not be necessary for most captioning / subtitling.

> > Transparency is used for background. I have never seen reversed text
(i.e.
> > background solid with transparent text).

> I have on occasion seen semi-transparent foregrounds and rarely (but it
does 
> exist) seen completely transparent foregrounds. One such example of 
> semitransparent foregrounds can be seen in the openings of most (all?) of
the 
> opening for the .hack//LIMINALITY anime series by Bandai.

The use of transparency for foregrounds seems contrary to one aspect of a
caption or subtitle,
which is to remain readable :-) I guess it's not so important for the title
or credits :-)

> > Positioning can be quite complex. Captions can steal into the safe
area...
> > often non speech characters (speaker change marks '-' or music marks '#'

> > will be positioned outside of the 'safe area', this gives more space for
the caption text.
> > Captions may be centred, left or right aligned, and this may vary from
> > caption to caption often to match the speakers on-screen position.

> Timed text may also be very position specific such as to overlap a visual 
> area, such as might be the case for a movie subtitled in another language.

This is an excellent point, I have ocassionally seen this used in broadcast
captioning.
There is another wrinkle on this one. 
There are cases where subtitles (no text - just a background) are used to
for censorship.
By careful positioning they are used to cover 'offending' body parts!
This allows a program to be broadcast 'in the clear' to for example a cable
head end
and for a local subtitle insertion to be used to apply the censorship
patches.

> > Vertical text has yet more rigorous demands, but is typically produced
as
> > graphics that are burnt over video.

> A complete TT format should remove any need for text to be burnt into
video. 
> If that is not to be the goal, the format would be better suited as only a

> captioning/subtitling format, and not timed text. Timed text is very broad

> and covers much more than just captions.

TT AF format does not remove the need or desire for burnt-in subtitling.
How do you retro-fit a TTAF decoder to 100 million TV sets?
TTAF is not primarily a distribution format (tho it could be).
In truth, I suspect TTAF may be too top heavy for captions/subtitles,
though I am reserving judgement until I see the Specification draft.
But this is a difficult balance to achieve. TTAF needs to be applicable
to broadcast captioning / subtitling (surely a major target area),
multimedia,
and generic text over time (e.g. scrolling text displays, timetables, 
teletext magazine services etc.)

> > Finally - although a distinction is (rightly) made concerning captions
and
> > subtitles, in terms of the system requirments for their display there is
very 
> > little difference. Captions may use more features for display than
subtitles, 
> > as captions carry non speech information as text and this
> > may be rendered using colour or styles that would not normally be used
for
> > speech related text display.

> Though not perhaps the technical definition of subtitles, they are usually

> considered to include overlaying translations of visual items making them 
> much more complex than captions.

Actually that is more of a definition of 'description'.

According to SMPTE:
subtitles are translation of dialogue
captions are dialogue and sound effects in the same language (no
translation) (i.e. for hearing challenged)
description is text substitute for the visual items (i.e. for visually
challenged)

Quite where this leaves a text service that is both a translation and
intended for the hearing 
challenged I don't know... it falls between two SMPTE stools :-)
E.g. Opera.

In the UK, Europe and Asia....

subtitle is used as a generic term for any text appearing over video.
The term subtitle is then qualified if necessary to make it clear if the
service is intended for the for hearing challenged.

Description services are very rare in broadcast - though description is 
sometimes included in services primarily intended for the hearing
challenged.
Far more common (though still infrequent) is the use of Audio description.
This is
an additional soundtrack that may be selected for use by the visually
challenged.

> > One hopes that the TT AF is simple enough to not need modules or
> > optional parts...

> Timed text is hardly simple. There are many effects that can be applied to

> text, such as fading, stretching, and dissolving. To handle any kind of 
> effect, there would need to be some part of the format allowing people to 
> define any new effects that might be used in the future.

Albert Einstein had it right.
"Things should be as simple as possible, but not simpler."

I think you are in a way right.... core TTAF should be really simple, 
with an extension mechanism to handle more complex concepts.

> > > This requirement only restricts the element and attribute names of the
> > > TT AF to ASCII, since R100 (use of XML) already ensured that all text
> > > content can be written in ASCII. So why not say explicitly
> > > that this item is about element and attribute names?

> > I read this as meaning that any character can be represented, but by
using
> > only the ASCII characters for that representation. E.g. Cyrillic
characters may be edited
> > into TTAF by typing in a Unicode codepoint in an ASCII form.
> > However - I don't read this as meaning that this method is the only form
of
> > representation for characters not in the ASCII set !

> Not sure I understand this part, but I would hope it will be 
> simple enough to simply use UTF-8 for everything?

Yes... using UTF-8 or any other valid encoding would work.
The requirements simply say that if you want to write a TTAF document in
ASCII,
that it should still be possible to include non ASCII characters in that
document
(but represented presumably by escaped sequences of ASCII characters).
Received on Tuesday, 20 January 2004 04:52:54 UTC