Re: Roll-up captions in WebVTT from Ian Hickson on 2011-12-05 (public-texttracks@w3.org from December 2011)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 5 Dec 2011 23:26:44 +0000 (UTC)
To: public-texttracks@w3.org
Message-ID: <Pine.LNX.4.64.1112052222110.9078@ps20323.dreamhostps.com>
On Mon, 28 Nov 2011, Christian Vogler wrote:
> 
> Ian: I am talking as a person who needs captions to get access to video.

Me too.


> For me these are not an option that can be turned on or off according to 
> aesthetic preferences.

Me too. For much of the day, if I watch video, the only way I can find out 
what is being said is through captions.

Also, when watching movies or TV shows, I often need subtitles to work out 
what is being said.


> What I am saying is that certain types of captions are not readable with 
> the normal pop-up procedures.

Can you elaborate on that? What types of captions are not readable with 
what kinds of pop-up procedures?


> I realize that not everyone has the same preferences, but there are good 
> reasons for displaying the captions this way for 30+ years.

Captioning predates audio tracks. As far as I'm aware, roll-up captions, 
which are quite rare overall (are they even used at all outside live TV in 
the US?), are in the minority when it comes to captioning types; even just 
within live captioning, I'm not sure they're in the majority.

I don't think argumentum ad antiquitatem is justified here.


On Tue, 29 Nov 2011, Silvia Pfeiffer wrote:
> 
> We can't ignore existing video and their captions.

Nobody is suggestion ignoring them; merely learning from them.


> Also, the roll-up format is not a mistake. While it was developed for 
> live captioning, these captions are recorded and re-broadcast as canned 
> captions, too, so you can't really state that this format is only used 
> for live captioning.

Maybe a better way to put it is that it is only used for content captioned 
in realtime, even if those captions are not then re-edited to the more 
usable pop-up style for later broadcasts.


> >> Captions are a legal part of a video - if you want to present a video 
> >> on the Web identically to how it has been authored, we need this 
> >> support, otherwise it's not the same artistic object and sites run 
> >> into copyright issues.
> >
> > I don't buy that for a minute. If you can get the permission to 
> > publish the content in the first place, you can get the permission to 
> > publish it using the Web's technologies.
> 
> Only if you replicated it identically.

We will _never_ be able to identically replicate TV broadcast (e.g. the 
font will be different, or the resolution will be different, or the 
scaling algorithm will be different, or we'll drop different frames, or 
the monitor calibrated differently).

So some differences are clearly acceptable.

So it's merely a question of negotiating.


On Tue, 29 Nov 2011, David Singer wrote:
>
> I think a fundamental question that needs addressing is whether we 
> expect roll-up to be (a) 'part of' the core VTT vocabulary or (b) a 
> presentational issue that is 'optional'?

When the captions are provided line-by-line, whether live-captioned or 
not, I can see an argument being made that they should be displayable as 
roll-up captions.

However, for live-captioned content that was written with roll-up 
presentation in mind, I don't really see how you can display it as pop-up 
in real time (in a rebroadcast where you have all the captions ahead of 
time, sure, just collapse all the cues into the start of each cue).

In conclusion, I think it has to be at least (a), though it would be cool 
to allow (b), though I don't really see how to do it.


> I tend to think the latter.  Yes, maybe smooth roll-up is easier on the 
> eye than jump-scroll, but the same information is presented.

In my experience, no scrolling at all is far better than either.


On Tue, 29 Nov 2011, Gal Klein wrote:
>
> We at PLYmedia are doing Live Captions for a long while.
> 
> We NEVER use roll-up captions as they are really unreadable if you want 
> to follow the video and the captions.

Hear hear.


> We do use stenographers but we collect their inputs and by using simple 
> but smart algorithms we break it down to readable captions lines.
> 
> All research studies made about captions clarify the roll-up captions 
> interfere with the viewers:
> 
> "While beyond the scope of this document, semantic compression and 
> omission techniques are documented in professional literature.  A fine 
> example is the analysis of respeaking at the BBC's news broadcasts, as 
> outlined by Carlo Eugeni, "Respeaking the BBC news", The Sign Language 
> Translator and Interpreter 3(1), 2009.
> 
> Uniformity in style and visual consistency is a crucial consideration 
> for viewer understanding.  Captions present additional visual 
> information to the broadcast displayed onscreen.  It is therefore 
> imperative to consider natural reading strategies, and overloading of 
> visual elements which captions may present.
> 
> An example of this is caption scrolling.  While a common practice in 
> many real-time broadcasts, caption line scrolling, or even single word 
> scrolling interfere with the visual consistency and impair reading 
> comprehension."
> 
> Keeping roll-up for LIVE is actually continuing with a very old 
> technology providing a bad accessibility service.

That's very interesting research. Can you point me to where I could read 
more about it?


On Tue, 29 Nov 2011, Christian Vogler wrote:
> 
> However, we don't live in a perfect world where during live broadcasts 
> we have well-formatted, well-synced captions without errors. The 
> solution that you are talking about has other drawbacks, including 
> increased lag between the spoken words and the time the captions appear, 
> which was my number one complaint for live captioning while living in 
> Europe.

This doesn't have to be such an issue on the Web, where the video can 
trivially be delayed a few milliseconds (client-side if not server-side) 
to move the captions more in line with the video.


> Aside from that, we still have to recognize that even if these issues 
> are solved, and even if we effect a shift away from roll-up to pop-on 
> captions over time, the fact is that these types of captions are still 
> in widespread use. If WebVTT does not support them, there will be a gap 
> between what is required by the broadcasters and by FCC rules to be 
> supported on the web and what the standard actually supports.

I feel this is a highly US-centric attitude. The volume of content over 
which the FCC has jurisdiction is miniscule compared to the volume of 
captioned content on the Web as a whole.


> In this case, one of two things would happen: there would be calls for 
> yet another standard that would take who-knows-how-long to figure out, 
> or broadcasters would make the argument that showing captions on the web 
> is not technically and economically feasible. In either case, 
> accessibility would be set back for a long time.

Or maybe the people involved might realise that they would make more 
money, and content would be more accessible, if they instead used the 
higher-quality captioning techniques. I don't see why we have to assume 
that the FCC and the traditional broadcasters are unable to see this.


On Wed, 30 Nov 2011, Shane Feldman wrote:
> 
> While it is "readable" consumers generally prefer pop-up captions. If 
> Ian refers to pop-on captions as "normal" captions, we agree with him. 
> Consumers would prefer that once live captions are completed and time 
> tracks recorded, then they can be reconfigured and rebroadcast with 
> pop-on captions. We brought this up during the VPAAC WG1 discussion on 
> live captioning.

If you have the timing data, it seems to me that converting roll-up 
captions to pop-up captions after the fact (for rebroadcast) is relatively 
straight forward. Just treat each line as a separate cue, collapsing all 
the timing on each line so that the whole thing appears at the time of the 
start of the line. When multiple lines appear in quick succession (e.g. 
back-and-forth monosyllabic chatter between news anchors), group multiple 
lines into one cue, such that each cue is at least N milliseconds long, 
where N is whatever the ideal length of time for a cue is.

This could even be applied to live broadcast if the video is delayed by 
the length of time of one cue (a trivial matter ont the Web).


> There was a study on eye gaze and captions. You can find a summary at:
> http://www.dcmp.org/caai/nadh133.pdf

Interesting study. If anyone else knows of any relevant studies, I'd love 
to read them also.


> FCC will likely issue a Report and Order in January mandating that 
> captions be of equal or greater quality than what was shown on TV.

Well changing them from roll-up to pop-up would achieve that easily 
enough...


On Tue, 29 Nov 2011, Jeroen Wijering wrote:
> 
> With JW Player, we've supported captioning for some 5 years now. We get 
> a few feature requests a week, which includes weird stuff from time to 
> time. We've never received a request for roll-up captioning so far. 
> Perhaps that may change in the future, as live streaming with captioning 
> becomes possible. For now though, I think it's safe to say that there's 
> no interest in it.

That's useful information also.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 5 December 2011 23:27:09 UTC