Markup for speaker distinction

I may be the first to post in this list - I guess somebody has to be. Please
take this as a sign of my enthusiasm for the topic :-)

I have been reading the comments in the previous mailing lists regarding
timed text for a while now.
I work for a company that produces software and hardware for the production
of subtitles for broadcast television and am excited at the prospect of a
more universal standard for timed text. At present within the broadcast
television arena the standards used for timed text (captions/subtitles etc)
are either proprietary or, if open, often interpreted in different ways.
Consequently there is no 'real' standardisation or compatibility across the
products from various vendors.

The requirements draft makes the following comment which concerns me:

	16. Use markup to clearly distinguish one speaker from another. This
could be accomplished by a) using simple placement commands (<center>,
<left>, <right>, etc.); or b) creating a persona for text which is spoken by
each speaker using speaker="IDREF" attribute

In our industry (subtitling) there is a considerable variation in the use of
presentation to impart extra meaning to the text. As a non exhaustive list,
presentation style (font, size, colour and positioning) is used to denote
speakers, the location of the speaker, the mood of the speaker, sounds (i.e.
non speech) e.g. CAR HORN!, the playing of music...... etc.

The use of placement (or other styling) elements in the timed text standard
as a means of speaker identification (i.e. content type) should be
discouraged.
I would prefer to see some type of tagging mechanism used to identify the
content of the text e.g. <Title>, <Speaker1>, <Audio Description1>,
<Narrative>, <Sound> etc. The actual presentation 'style' of these elements
would be the responsibility of a separate mechanism to impose style on the
timed text content, and may in part be player dependent. Positioning,
placement and all other issues of presentation would be a matter of
interpretation of the tags (if desired) and a corresponding (if provided)
'style sheet'. This fits well with the subtitling industry - where the
'style' for the presentation of the subtitles is often set by the
broadcaster - not the originator of the subtitles.

However, there are forms of subtitling that are difficult to implement
without **some** mechanism for controlling (or suggesting) placement. One
form, called 'snake', adds words piecemeal to the ends of subtitle rows as
they are spoken. When a row fills it moves up. This form of subtitling
treats the 'display' in a similar manner to an 'old style' terminal. A
similar form of subtitling occurs in Teletext with add-on subtitles. In both
cases the previous partial subtitle must remain on screen when the next one
appears. In burnt in (open) subtitling, a new subtitle replaces the previous
one.

Snake and Add-on subtitling may be achieved by issuing sequences of 'placed'
text elements, where each subsequent element replaces the previous one and
includes a larger proportion of the complete subtitle. However this is
inefficient and for rapid subtitle updates may exceed the capabilities of
the transmission chain and display equipment. These forms of subtitling
might be better supported by using a tagging mechanism that can identify
text as being a part of a larger composite text element (but where each
fragment has its own distinct presentation time).

I realise that there is a considerable range of applications for a timed
text standard, all of which will have valid concerns. I sincerely hope these
comments may be useful to the members of the tt@w3.org working group.


regards 

	John Birch
	Senior Software Engineer
> 	Screen Subtitling Systems  
> 
> 	The views and opinions expressed are the author's own and do not
> necessarily reflect the views and opinions of the Screen Subtitling
> Systems Limited.
> 

Received on Monday, 20 January 2003 10:12:35 UTC