Re: Rollup captions: an analysis and suggestion

Hi Shane,

On Tue, Apr 10, 2012 at 7:56 PM, Shane Feldman <shane.feldman@nad.org> wrote:
> Silvia et al,
>
> Thanks for putting together the webpage on roll-up captions and being
> sensitive to user concerns about the poor quality of roll-up captions
> compared to pop-on captions.

Thanks for going through the document and for your feedback.


> There are two concerns with live captioning, accuracy is abysmal and the
> timing is always behind by a few seconds.

Neither of these is a fault of the scrolling display, but both are a
fault of the inaccuracy of the caption creation. Even if those same
captions were displayed in pop-on mode, their accuracy would still be
abysmal. Also, the timing issue can be fixed in both, pop-on and
roll-up mode, by delaying the video until the captions are available
in sync. Thus, these two points are not arguments against roll-up
captions, but they just document the fact that live captioning is of
worse quality than post-produced captions, which is, frankly, not
surprising.

In contrast, the focus of the rollup wiki page is on supporting the
roll-up display feature.

Are you suggesting that we should completely kill rollup as a feature
on the Internet? Further, are you suggesting that the FCC got it wrong
when they are requiring captions on the Web to be displayed exactly as
they have been displayed on TV, and that we should introduce a
conversion from roll-up to pop-on for all captions to be published on
the Web? If that is the position of the National Association of the
Deaf, then I will have to re-assess my proposal. I was under the
impression (also from my research as quoted below) that there is a
substantial number of deaf users that actually prefer the roll-up
feature over the pop-on feature and that therefore we have to provide
a sensible means for authoring roll-up captions in WebVTT.


For now, back to addressing your feedback.

> These statements on the W3C rollup webpage are not accurate:
>
>> In addition, since lines are kept on screen longer than for typical pop-on
>> captions, the reader has more time to capture the conversation, in
>> particular if a real-time captioner has made a mistake and provides a
>> correction in the next line.

I did not make this up - this was feedback that I collected from
different resources, including the report that is attached to the Wiki
page.

The comparison being made here is between examples as follows:

In pop-on mode (where successive captions replace captions displayed before):
---
caption1
The prime ministral today announced

caption2
The prime minister today announced

caption3
that there will be a tax exemption.
---

Compare this to roll-up mode rendered at the same times that caption1
- caption3 would be shown:
---
caption1
The prime ministral today announced

caption2
The prime ministral today announced
The prime minister today announced

caption3
The prime ministral today announced
The prime minister today announced
that there will be a tax exemption.
----

In this example, the user will continue to see over the duration of
the three captions the mistyped text, then the corrected text, then
the remainder of the sentence which gives him/her the full context and
much more time to grasp it.

In contrast, the same cues in pop-on mode are only available for a
very short time and the connection between them is lost since they
appear as a replacement of each other.

Would you agree that the above statement is correct for this (and
similar) examples?



> and
>
>> No matter their poor quality, studies surprisingly also found that users
>> are actually split on their preference as to how they want live subtitles to
>> be displayed: half of them actually prefer the roll-up display and half
>> pop-on. Therefore, there is a user requirement to continue supporting
>> roll-up caption modes.

Again, may I point to the referenced report in section "User needs":
it explicitly states:

"The results are very divided and [..] more favourable to block
subtitles (45.6% vs. 44.8%). A more thorough analysis reveals that
word-for-word display is mostly preferred by deaf viewers,
particularly those who use BSL or who have lost their hearing at birth
or in the first years of their lives. Many of them cannot hear the
original soundtrack but they can see how people speak and they know
language is not spoken in blocks, but word for word. Some of these
viewers specified in the survey that subtitles displayed in blocks
look manipulated, edited or tampered with, whereas scrolling subtitles
look like the real thing, giving them the impression that they are
listening with their eyes in real time."

Thus, those who need subtitles the most, namely those that have been
deaf for the longest, are those who prefer roll-up captions over
pop-ons. For the general  population we get on average roughly the
same percentage that prefer pop-on subtitles as roll-up.

Do you have studies to the contrary of the study I have quoted?


> It is more difficult to follow live captions when the words are constantly
> moving up at varying speeds, and behind the action (due to the delay) as
> opposed to pop-on captions where we can anticipate that the words will not
> move and we have a specific time period to read the captions.

I believe you are comparing well authored pop-on captions with badly
authored roll-up captions. I don't believe that is a fair comparison.
As I stated above: the fault of varying speeds and being delayed is
not the fault of the roll-up display, but of the way in which they are
authored. This can be fixed on the Internet without needing to move to
a pop-on display.


> Further, with
> live captions, we miss much of the on-screen action/images because we are
> constantly trying to keep up with the live captions by watching the most
> recent line at the bottom of the captioning box.

I agree with this, which is also a fact that the quoted study arrives
at: more time is spent looking at the captions than would be for
pop-on. However, it seems that this fact does not change the
preference of 45.6% of viewers which is for a rollup display.


> The "Quality in Live Subtitling" report, attached to this email and
> referenced on the rollup captioning webpage discusses the situation
> described describe above which are identified as the quicksand effect and
> astray fixations for fast readers and regression for slow readers. In
> describing rollup captions the author notes:
>
>> all viewers waste time chasing subtitles which seem to be playing
>> hide-and-seek with them, preventing them from watching the images.
>
>
> and
>
>> this chaotic reading pattern and the almost non-existent time left to
>> ‘read’ the images may go some way towards explaining the poor comprehension
>> results obtained by deaf, hard of hearing and hearing participants in the
>> comprehension test...


May I explain that we are not just introducing the roll-up feature as
an alternative to pop-on captions. We are also introducing it as a
different way to display time-overlapping captions. Right now, if
captions are rendered in WebVTT that overlap in time, no text moves,
but the new text is displayed where it can find space. This means that
if the first line at the bottom is occupied, a second,
time-overlapping text line is rendered on top of the first line. Now,
if the bottom text line disappears and we have to render another line,
that new line will be rendered below the second line. It is this
currently proposed default rendering of time-overlapping text that we
would like to get more flexibility with.

I believe that if you compare roll-up to this
"render-text-where-there-is-space" approach, you may find that even
though the text does not move, peoples' eyes will still have to dwell
much longer on the captions than with pop-on captions, since they
constantly have to figure out which is the text that was already there
beforehand and which is the new text. Some people may find it logical,
but others may get completely confused. To my shame, I have to admit
that I belong in the second class and that the current rendering
pattern of time-overlapping WebVTT cues is so confusing for my eyes
that I ultimately have to stop looking at the captions so as not to
miss anything in the movie.

I would much prefer a pattern where I know where to expect the new
text: either it's always on top of the old text (such that when I
discover movement out of the corner of my eyes, I can just focus my
eyes on the top line and read that before returning to the video), or
it's always below the old text (which gives me similarly a consistent
location to return focus of my eyes to).


> and finally in referring to how much time we spend on reading the captions
> as opposed to watching the images, the study found:
>
>> in scrolling mode viewers spend most of their time bogged down in the
>> subtitles (an average of 87.5% vs 12.5% spent on the images), whereas in
>> block subtitles they have more time to focus on the images (an average of
>> 67.3% on the subtitles and 32.7% on the images).
>
>
> I would not take the survey results as concrete evidence of users preferring
> live captions over pop-on captions. There are several problems with the
> survey. First, the survey asks if users prefer "word-for-word" captioning or
> "block" captioning. Could the consumer have confused a preference for
> verbatim/easy-reader captions as opposed to popon/rollup captions (consumers
> will pick verbatim captioning over easy-reader captions a majority of the
> time)?

Actually, that is additional feedback for me to say that roll-up is an
important features, since it indicates to users that they can follow
what is being said verbatim.

> Further, do consumers understand the difference between popon and
> rollup captions? It would be better to have an actual study that shows
> consumers popon and rollup captions for the same program and then asks them
> to rate their preference.

It seemed to me that the study made sure that users understood the difference.

> In addition, this study focuses on TV captioning
> only, and not the Internet. Viewing habits and preferences on the Internet
> may be different than on TV.

We can expect that in future all "TV programming" will happen over the
Internet. If we now have users that prefer in TV mode roll-up
captions, then this is again support for the need for a roll-up
feature also on the Internet.

> And the study notes that most consumers think
> that live captioning is automatic speech-recognition which may influence
> their perception of rollup/popon captions.

I don't believe that perception has any influence on whether somebody
prefers roll-up over pop-on. In fact, such a perception should skew
users towards the better quality pop-on captions more than anything.
It is even more surprising then that there is so much user preference
on roll-up.

> Further, there is a bias in this survey when the RNID states, "Considering
> that it is currently impossible to match live subtitles with images
> perfectly..." which is no longer true on the Internet. Last month at the
> South by Southwest (SXSW) Conference, I had the opportunity to serve on a
> panel with Adobe, HBO, and Viacom
> (http://schedule.sxsw.com/2012/events/event_IAP13011) where Glenn Goldstein
> of Viacom revealed that his company has implemented an automatic solution
> for the timing problem where roll-up captions are converted to pop-on
> captions and moved up three seconds to synchronize the captions with the
> audio for their web videos including Jon Stewart's "Daily Show" one of the
> more popular programs in the United States. Glenn provided a side-by-side
> demonstration of live captioning with roll-ups and pop-on captioning that
> had been synchronized for the same program. The ease of watching and
> following pop-on captions compared to live captioning was immediately
> noticed by the hearing audience. Also, the audience noticed that the lag
> between the spoken audio and captions was significant. This solution applies
> to the Internet only though, and I understand from Viacom that it cannot be
> implemented for their TV programs; however, as the Romero study notes, this
> can be addressed on TV by delaying the TV signal which is currently done in
> Holland with "good results".

Did they also show better timed roll-up captions? The timing
improvements do not necessarily have to require switching to pop-on
mode. Comparing poorly authored roll-up captions to optimized pop-on
captions does not give me any data about whether users prefer one
display style over the other.


> Finally, can you elaborate on the following statement on the rollup webpage?
>
>> Users should at least have the opportunity to provide a preference as to
>> how they want their captions displayed. Such a preference setting is
>> currently not possible with WebVTT, which will never move cue text, but
>> instead place new cue text lines either on top of already rendered text
>> lines or fill a line below if it has become empty.

I have tried to explain it above. Do let me know if I was not able to
make myself understood.


I am curious about your opinion on whether we need a roll-up display
feature for WebVTT. As you compare these display features, consider
captions that are perfectly authored: cues are accurately timed, the
words are being displayed as they are being said, and no typos are
being made. Now tell me if these should be displayed as pop-on (with
every line being a cue of its own), as roll-up (with, say, always 3
lines of cues on screen), in "where there is space" mode (with, say,
three lines of cues on screen), or in a way that an author or a user
prefers to have it displayed, including a roll-up choice. What would
you say?

Best Regards,
Silvia.

Received on Tuesday, 10 April 2012 12:29:19 UTC