WebVTT, Regions and live streams.

Hello, 

My name is Bill May; I’m an engineer at MLB Advanced Media looking into how to expand Closed-Captions into a world-wide solution.

We’re looking for a user experience similar to closed captions; the ability to have 1 word/letter at a time displayed, support for roll-up type caption for both VOD and live (event and linear) presentations.  We use HTTP Live Streaming (HLS) as our video protocol.

CEA-708 captions are are one possible solution, but we believe that it will get harder as we add more languages.  We’d like to unbundle the captions from the other media.

So, it will come as little surprise that I’m thinking of webVTT.

With the later versions of the specification, WebVTT has done an excellent job of translating 608/708 and the properties required into webVTT using the region attributes, but only for a completed VOD type presentation.

Solutions like HLS and DASH use a duration based segmentation to provide (near) live streams.  When we need to provide webVTT cues for a live stream, the direction isn’t very clear.  Specifically:

1). How to handle a EDM (clear screen).
2). what to do at the end of a segment/beginning of the next when the closed caption line spans the segment.

case 1: (EDM)
When we use the region syntax, I have been assuming is that each cue gets a start time, and an end time that encompasses the 16 seconds maximum time that the closed caption specifications state.

That way, as the captions are added, the oldest ones will roll out of the region, even if they have time left.  If captions aren’t added, they have (Please correct me if my assumption of the region is wrong).

However, that doesn’t let me enter a clear screen command.  There’s no way to change the end time of those earlier cues.

One possible solution for this is to add a bunch of short lived cues with non-breaking space, but I do not believe that this is acceptable due to the background artifacts.

case 2 (cues that overlap segmentation)
As for the segmentation, assume we have the following: Assuming a region with 2 lines, and let’s say we want to push out each word every 3/4 second (by using the timing mechanisms).  (I’ve left out any cue settings to make it a bit clearer)

00:00:00.000 —-> 00:00:17.500
Caption <00:00:00.750>Line <00:00:01.500>1

00:00:03.000 —> 00:00:22.000
Caption <00:00:03.750>Line <00:00:04.5>2<00:00:5.25>Longer<00:00:6.000>extended

If we need to provide segmentation at 5 second intervals, the segmentation process does not have the words “Longer” and “Extended”; they haven’t been entered by the captioner.  It also doesn’t know what the end time should be, as it should be 16 seconds from the end of the last word.  

Creating a new cue that starts for “Longer” and “extended” will cause the “Caption Line 2” part to scroll; we want it to continue on the same line.

I’d like to know if there is a clear solution to these problems, and if not, if additions to the specification can be added to handle these cases.

Thanks,
Bill May 
MLB Advanced Media

Received on Monday, 21 September 2015 07:49:48 UTC