RE: [media] WHATWG started requirements collection for time-aligned text from Geoff Freed on 2010-04-23 (public-html-a11y@w3.org from April 2010)

From: Geoff Freed <geoff_freed@wgbh.org>
Date: Thu, 22 Apr 2010 20:07:26 -0400
To: Ian Hickson <ian@hixie.ch>, Sean Hayes <Sean.Hayes@microsoft.com>
CC: Dick Bulterman <Dick.Bulterman@cwi.nl>, Eric Carlson <eric.carlson@apple.com>, Frank Olivier <franko@microsoft.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <B3526F4AC3C3C64388BF661A8B2112A764C70ACF78@EXCHCCR.wgbh.org>
A few comments inline.
Geoff/WGBH

________________________________________
From: public-html-a11y-request@w3.org [public-html-a11y-request@w3.org] On Behalf Of Ian Hickson [ian@hixie.ch]
Sent: Thursday, April 22, 2010 6:16 PM
To: Sean Hayes
Cc: Dick Bulterman; Eric Carlson; Frank Olivier; HTML Accessibility Task Force
Subject: RE: [media] WHATWG started requirements collection for time-aligned text

On Thu, 22 Apr 2010, Sean Hayes wrote:
>
> This is indeed somewhat rare, however in Europe colour and in the US
> typography, is used to denote speaker. If as occasionally happens,
> multiple speakers are encoded in the same caption because they are
> speaking at the same time, then may be necessary to have multiple styles
> for one cued caption:
>
> e.g.
>               Are you going?
>               Yes.                    No.
>
> It is however possible if the layout system is sufficiently flexible,
> that these could be implemented as separate captions with overlapping
> display times. Sometimes the second utterance may need to appear
> marginally later than the first, and this may be implemented by
> animating a display style.

Noted. (Examples of the above in real-world videos, or stills from
real-world videos doing this, would be helpful. Do you have any examples
of this?)

GF:
The example cited by Sean is a good illustration.  It may not occur often and, as such, may be difficult to find a example to include in a screen shot.  However, Ian's response brings up a point that I think needs addressing.  Limiting examples only to situations that are found in real-world videos today somewhat defeats the purpose of using or developing new technology.  If we followed that rule, then frankly SRT would be all we'd need.  There are reasons for including "advanced" features in the text format(s) that the group will choose, all of which I summarized in that lengthy note I sent to the list about a month ago (http://lists.w3.org/Archives/Public/public-html-a11y/2010Mar/0277.html; paragraph #2 is especially pertinent).  Some of these features may not be used every day in the broadcast or Webcast world, but that doesn't mean they shouldn't be included in the examples you seek.



> Scrolling (horizontally as in ticker tape, or vertically as in US rollup
> style captions), can also be implemented using animated styles.

Is there a need for roll-up captions on the Web? I was looking at this the
other day and couldn't find anyone on the Web who actually did this. It
seems to be primarily used for live transcription -- how should we support
this case on the Web? Are there going to be external files with streaming
captions for live video, or would captions in those cases be in the same
stream as the video file?

GF:
There is very definitely a need for roll-up captions, if it's possible for us to include that capability.  A huge portion of the US broadcast schedule is captioned via roll-up captions.  It's vital for broadcasters who will want to convert their roll-up captions quickly and easily for Web use without reformatting them in pop-on style.


> There is a technical use for inline styling and timing as well, Captions
> that are auto-converted from CEA-608 format (which is probably the
> largest single source of caption data), may need to use this because
> captions in that format are delivered two characters per field (4
> characters per frame), and may include real-time error correction (when
> the caption source was from a stenographer) altering characters inline.
> It is sometimes easier to replicate exactly what the 608 source does,
> rather than try to figure out what the 'right' caption was.

Can you elaborate on this? I'm not an expert in this field, so if you
could translate the above into English-for-dummies that would be great. :-)
What does it mean in terms of what the spec has to support?

GF:
Not to speak for Sean, but I'll try anyhow and he can correct me where I misrepresent him:  In many real-time captioned programs, when the captions are written on the spot by a stenographer, the stenographer will literally backspace over a misspelled word and re-spell the word correctly.  Replicating that will be faster than correcting it when the data are converted to the new format.

In-line styles are also used in pop-on captions:  captions with one or two italicized words among plain text are not uncommon.


> Another use case to consider is vertical text, as is sometimes used in
> Japanese subtitling (see attachment DCP0361), and mixing this with
> horizontal text. In DCP 0367 there is an example of furigana (Ruby)
> text.

Vertical text and ruby are definitely on the list.


> Use case of relative timing:
>
> The advantage of relative timing is mainly seen at the editing stage, it
> is easier to simply change the duration of one caption, and have all the
> subsequent ones move up automatically, than it is to change the onset
> time of every caption. However since most authoring will be tool
> supported, (as captioning anything other than a minute or two of video
> by hand is extremely tedious and error prone), this may not be an
> absolute requirement for playback. Should <video> start to support
> fragment indexing or playlists however, then this could become more
> useful.

Interesting. Noted. Thanks.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 23 April 2010 00:08:02 UTC