RE: Timed Text (TT) Authoring Format 1.0 Use Cases and Requirements - Comments! from Sean Hayes on 2003-05-20 (public-tt@w3.org from May 2003)

From: Sean Hayes <shayes@microsoft.com>
Date: Tue, 20 May 2003 11:50:35 +0100
To: <Johnb@screen.subtitling.com>, <public-tt@w3.org>
Message-ID: <112C6E8A1B25D34BB27D48D2FD2E96CF05787F39@TVP-MSG-02.europe.corp.microsoft.com>
Actually, I'm mostly on your side - I'm just trying to capture the essence of the debate.
 
Consider the title sequence to a Movie, for example the Pink Panther. It has a lot of crazy text fonts, images and animated characters. Some or all of this might be considered TT. From the graphic designers point of view he wants total control over the look of the final result, thus authoring preference, and 'normal' consumption preference in this case may be for bitmap (or a very sophisticated text layout engine).
However none of this is going to be much use to somebody that can't see it, so the user preference maybe an alternate rendering (speech/braille whatever). Thus both forms are required, but which is the preferred form is a matter of perspective.
 
Also consider the dialog of a movie, clearly here Hollywoods preferred rendering is the actors voices. However alternate renderings might be in foreign languages or as visual forms for those that can't hear/understand the "natural" soundtrack, again in the abstract the 'TT' has multiple forms, which is preferred depends on your point of view.
 
As language is very general purpose we can normally describe everything we need to - but sometimes it becomes a bit long winded, emoticons evolved for example as a notational shorthand. So a graphic of an excited dog barking or whatever, may convey the information better than the literal text description in some circumstances - for example for younger children. A dog's bark clearly is NOT text, whereas reference to it in an email certainly is.
However the long winded description will need to be in place in case the output cannot be visible (for example in a telephone prompt or braille writer)
 
Since 'text' is algorithmically tractable for a variety of purposes, I would agree that it is should generally be present for any 'stream' in a TT file, except possibly in very rare cases. However it may not always be the first choice for providing the user experience. 
 
Also we have to consider the overall system model(s) in which TT is anticipated, it may be that we delegate all of the alternate representations to SMIL for example - in which case you might be right, then again there may be very good reasons why we may not. That however is a technical discussion we have yet to have.

________________________________

From: Johnb@screen.subtitling.com [mailto:Johnb@screen.subtitling.com]
Sent: Mon 19/05/2003 17:43
To: public-tt@w3.org
Cc: Sean Hayes
Subject: RE: Timed Text (TT) Authoring Format 1.0 Use Cases and Requirements - Comments!



Sean, 

Comments interspersed: 

> -----Original Message----- 
> From: Sean Hayes [mailto:shayes@microsoft.com] 
> Sent: 16 May 2003 16:12 
> To: Glenn A. Adams; Johnb@screen.subtitling.com; public-tt@w3.org 
> Subject: RE: Timed Text (TT) Authoring Format 1.0 Use Cases and 
> Requirements - Comments! 

> Bitmaps are also useful to indicate non text (e.g. 'dog 
> bark', 'music'), 

How can 'dog bark' be non-text! It quite clearly **is** text. As is the word 'music'. It may not be **spoken** text - dialog in a film say - but it certainly is text.

There are systems that use bitmaps to carry text content. In some cases this is because of the non availability of a text glyph suitable for the purpose (e.g. musical note) or because the emphasis required for the content, e.g. colours or style is unavailable to a text representation. In other cases it may be because of a desire to 'fix' the presentation style so that the viewer cannot alter it. However, I would strongly suggest that where it is possible to do so - text is always represented as text - not in a pre-rendered form. For what reason? Simply because in some cases the display of the text content to the user may be via non-visual means (i.e. by computer speaking, Braille etc.) In those circumstances the presence of what is basically text content (and in your examples - content intended for deaf viewers) as bitmaps - renders that content unavailable.

Sorry to hammer this point - but I think it important.... There are IMHO few sound reasons for preferring a bitmap representation over a text one.

Some examples where a bitmap **may** be prefered over text: 

Company logos (but should be accompanied by text description). 
Unique or special symbols e.g. musical notes on a stave (again should be accompanied by text description). 

I might question as to whether these elements should be capable of being carried in a TTAF file at all. 

> in these cases it would also be wise to 
> include an "alt" text, but in such cases the graphic is 
> probably the preferred rendering. 

Why? 

Some good reasons why it might not be. 
a) User display is different resolution to anticipated. 
b) User display is different aspect ratio to anticipated. 
c) User display does not have the colour depth anticipated. 
d) User display does not have the capability to overlay bitmap data. (e.g. Braille display) 

> We have had a few 
> discussions on which should be considered the primary 
> rendering, and which a secondary. It is probable that TT will 
> have some author mechanism for expressing a preference order, 
> which can be overridden if it doesn't fit the capabilities of 
> the end user. 

Hmmm....  SMIL customTestAttributes anyone....   ;-) 


regards John Birch.
Received on Tuesday, 20 May 2003 06:50:39 UTC