- From: David Kirby <david.kirby@rd.bbc.co.uk>
- Date: Fri, 01 Feb 2002 14:35:38 +0000
- To: www-tt-tf@w3.org
At BBC R&D we're developing new techniques for preparing subtitles/captions and the need for a timed text markup arose early on in this work. We're using an xml-based format that we came up with, to carry not only timing data but other information that we need to produce subtitles from timed text. Our requirements, roughly speaking, fall into three categories: a) timing data and other information about the original text that goes into the subtitles (there's nothing here that is specific to the final subtitles that are produced). b) timing and other data defining the subtitles that are created from the text (of which there might be many variants) c) other data that link the subtitles to the video To elaborate, some in category (a) are: 1) the names of each speaker in the programme 2) the default text colour assigned to them 3) the text, marked with scenes and speakers names 4) the timing information for each word, i.e. start and end times 5) any changes of text colour for a speaker from their default (should it be necessary to change the text colour temporarily to ensure the subtitle remains unambiguous) in category (b) we have: 6) the words that are in each subtitle, line by line 7) the in- and out-time for each subtitle 8) the subtitle type and position 9) foreground and background colours, possibly changing word by word ...and in (c): 10) data linking to the video, for example, frame rate, timing reference point (this could be simply the start timecode of the video but there are one or two other timing issues here, especially with compressed bitstreams) 11) timings of shot changes in the video. I don't know about the US but in the UK, subtitle in/out times are anchored to nearby shot changes so we need those timings too if we are to produce subtitles automatically from timed text. The markup specification that we are using also includes other data but perhaps that's getting more specific to our application. However, the point is that for the production of subtitles/captions we need many other entries beyond the timing data for the subtitles themselves. I don't know if any combination of existing standards can achieve this. On the question of standards that may be relevant, here are three: EBU Tech. 3264-E "Specification of the EBU subtitling data exchange format" This is the subtitle exchange format defined by the EBU (European Broadcasting Union). As mentioned already on the list, work at http://lithpc17.epfl.ch/stlml/ produced an xml markup for this format but that doesn't give the flexibility that we need with timed text. "Digital Video Broadcast: Subtitle File Transfer Format" from the European Telecommunications Standards Institute (may still only be in draft form). This describes the file format for transfering files containing DVB subtitles between preparation and transmission. It's an evolution of the EBU 3264 file format that is currently being standardised. EIA-608 Already mentioned on the list. This defines the line-21 transmission format but it doesn't define a storage format for the captions themselves. As I understand it, the various file formats used to store line-21 captions are proprietary and closely guarded! Our own requirements are focussed more on the authoring of subtitles/captions rather than the markup of finished caption text. Although this brings in more requirements, a standardised format for timed text from which the different delivery formats (teletext/line-21/DVD etc.) can be produced, would be much more useful. David Kirby -- David Kirby Project Manager BBC Research and Development Kingswood Warren Tel: +44 1737 839623 Tadworth, Surrey. Fax: +44 1737 839665 KT20 6NP, UK. email: david.kirby@rd.bbc.co.uk
Received on Friday, 1 February 2002 09:35:49 UTC