RE: [media] WHATWG started requirements collection for time-aligned text from Ian Hickson on 2010-04-22 (public-html-a11y@w3.org from April 2010)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 22 Apr 2010 22:16:07 +0000 (UTC)
To: Sean Hayes <Sean.Hayes@microsoft.com>
Cc: Dick Bulterman <Dick.Bulterman@cwi.nl>, Eric Carlson <eric.carlson@apple.com>, Frank Olivier <franko@microsoft.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <Pine.LNX.4.64.1004222210280.14147@ps20323.dreamhostps.com>

On Thu, 22 Apr 2010, Sean Hayes wrote:
>
> This is indeed somewhat rare, however in Europe colour and in the US 
> typography, is used to denote speaker. If as occasionally happens, 
> multiple speakers are encoded in the same caption because they are 
> speaking at the same time, then may be necessary to have multiple styles 
> for one cued caption:
> 
> e.g.
> 		Are you going?
> 		Yes.			No.
> 
> It is however possible if the layout system is sufficiently flexible, 
> that these could be implemented as separate captions with overlapping 
> display times. Sometimes the second utterance may need to appear 
> marginally later than the first, and this may be implemented by 
> animating a display style.

Noted. (Examples of the above in real-world videos, or stills from 
real-world videos doing this, would be helpful. Do you have any examples 
of this?)


> Scrolling (horizontally as in ticker tape, or vertically as in US rollup 
> style captions), can also be implemented using animated styles.

Is there a need for roll-up captions on the Web? I was looking at this the 
other day and couldn't find anyone on the Web who actually did this. It 
seems to be primarily used for live transcription -- how should we support 
this case on the Web? Are there going to be external files with streaming 
captions for live video, or would captions in those cases be in the same 
stream as the video file?


> There is a technical use for inline styling and timing as well, Captions 
> that are auto-converted from CEA-608 format (which is probably the 
> largest single source of caption data), may need to use this because 
> captions in that format are delivered two characters per field (4 
> characters per frame), and may include real-time error correction (when 
> the caption source was from a stenographer) altering characters inline. 
> It is sometimes easier to replicate exactly what the 608 source does, 
> rather than try to figure out what the 'right' caption was.

Can you elaborate on this? I'm not an expert in this field, so if you 
could translate the above into English-for-dummies that would be great. :-)
What does it mean in terms of what the spec has to support?


> Another use case to consider is vertical text, as is sometimes used in 
> Japanese subtitling (see attachment DCP0361), and mixing this with 
> horizontal text. In DCP 0367 there is an example of furigana (Ruby) 
> text.

Vertical text and ruby are definitely on the list.


> Use case of relative timing:
>
> The advantage of relative timing is mainly seen at the editing stage, it 
> is easier to simply change the duration of one caption, and have all the 
> subsequent ones move up automatically, than it is to change the onset 
> time of every caption. However since most authoring will be tool 
> supported, (as captioning anything other than a minute or two of video 
> by hand is extremely tedious and error prone), this may not be an 
> absolute requirement for playback. Should <video> start to support 
> fragment indexing or playlists however, then this could become more 
> useful.

Interesting. Noted. Thanks.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 22 April 2010 22:16:35 UTC