Re: Displaying multiple lines in WebVTT

On Thu, Apr 5, 2012 at 3:44 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

>  I understand where you're coming from and some part of me agrees that
> line breaks should be left to the browser.
>
> However, there are rules on what quality captions should look like and
> how lines should be broken, see
> http://www.dcmp.org/captioningkey/text.html the line division section.
> That would not be possible unless you allow explicit lines breaks and
> make it easy to author them. I believe that is the reason why most
> captioning formats work with explicit line breaks.
>

I'm not suggesting prohibiting manual line breaks, of course; only making
them explicit, and as a side-effect of that, encouraging people to leave
wrapping up to the browser.

We have automated line wrapping for long lines. However, "long" is
> only defined as hitting the edge of the video element. If you have a
> better suggestion for when line breaking should kick in, I think that
> might be a good idea.
>

I'm not sure, exactly.  Users probably have different preferences, so I'd
suggest leaving this up to browsers.  (Since you can't precisely control
font rendering, sites can't depend on captions coming out a precise size on
all browsers anyway, so I don't think this reduces interop.)

 > WebVTT text should mimic HTML (in its default whitespace mode): collapse
> > newlines to a space, and use a <br> marker to indicate explicit line
> breaks
> > when they're really wanted.
>
> There would be too many <br>s since all captions are usually
> hand-crafted. When the video is increased in size, the captions are
> scaled up in font-size, so that works out.
>

That's a tendency that we need to discourage.  "Hand-crafting"
word-wrapping is a fundamentally, inherently broken way to author content
on the Web, since (among other reasons) we don't prescribe font rendering.

Note that SSA/ASS captions (the most common formats for fansubbing) usually
does use automatic word-wrapping.

When the font being used to render captions is larger than the font the
author used, it can easily result in lines no longer fitting, which results
in captions meant to render like this:

> word word word word word word word word word word word<br>
> word word word word word word

(<br> being the author's manual break) ending up looking like this:

> word word word word word word word word word word
> word<br>
> word word word word word word

This isn't theoretical.  I've seen this artifact in the real world many
times (probably with SRT).

(Please don't say that users can't be allowed to choose their own minimum
font sizes.  That's a fundamental accessibility feature.  I always set a
minimum font size in my browser, because web pages often use font sizes too
small for my comfort.  That needs to apply to captions, just as with other
web content.)

This can happen if the line is longer than expected for any other reason,
too.  Different font engines will result in different renderings; different
fonts will be used due to font replacement when the font selected isn't
available; even the same font can render differently in different versions
of a font, and so on.  Content that expects a particular font rendering is
broken, whether it's an HTML document or a caption, and we should do what
we can to minimize that sort of content.  Currently, the format
*encourages* it, which is very bad.

(As a final note, even when people really want to manually wrap captions, I
disagree that it results in too many <br>s.  There's no significant harm in
that--certainly none that outweighs the benefits--and it only affects
badly-authored captions anyway.  Anyhow, the only case I can see where
people might legitimately--for some value of "legitimate"--be manually
word-wrapping is when converting from other formats, in which case it
doesn't matter if there are lots of <br>s.)

>  A "balanced" word-wrapping mode should also be
> > added, to wrap lines in with balanced line-lengths, which is the typical
> > wrapping method for captions.
>
> How do you suggest that should look?
>

Basically, instead of using paragraph-style wrapping, which wraps (roughly
speaking) at the latest opportunity per line:

word word word word word word word word word word
word word word word word word word word word word
word word

it adjusts the breaks to attempt to make each line a similar length:

word word word word word word word word
word word word word word word word word
word word word word word word

It would never use a greater number of line breaks than in the regular
wrapping mode.  Above, two line breaks are used, and balanced wrapping
would never increase that to three in an attempt to balance more evently.
It would only move the breaks around.

(This would be a CSS feature that WebVTT would use, not a WebVTT-specific
feature.  I think Ian at least sounded open to the idea when I talked to
him about it last.)

-- 
Glenn Maynard

Received on Thursday, 5 April 2012 14:22:41 UTC