Re: The need of newlines in WebVTT (was Re: Displaying multiple lines in WebVTT) from Glenn Maynard on 2012-04-10 (public-texttracks@w3.org from April 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Tue, 10 Apr 2012 17:11:18 -0500
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Frank Olivier <Frank.Olivier@microsoft.com>, "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <CABirCh9F2PxE=GG_8-gWPG4NE3wfYt-K=h5dW4hXu-nhNo-eXw@mail.gmail.com>
On Mon, Apr 9, 2012 at 11:03 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> > I'm not sure, exactly.  Users probably have different preferences, so I'd
> > suggest leaving this up to browsers.  (Since you can't precisely control
> > font rendering, sites can't depend on captions coming out a precise size
> on
> > all browsers anyway, so I don't think this reduces interop.)
>
> They're roughly the same, which is, I believe, sufficient for interop.
>

They're not.  If I configure my browser to use a minimum font size of 14pt,
and your captions are authored for 10pt, then it's going to be
significantly different.  Letting people pick their own font size is
critical (anyone with less than 20/20 vision who has tried to read a
webpage authored by someone with 20/10 understands this all too well).

Also, font sizes for videos displayed on a phone may be in a different
proportion to the video than font sizes on a TV, depending on the size and
resolution of the display.

OK, but there is a large number of existing content that uses those
> hand-crafted newlines. I think they should continue to be supported.
>

I'm not suggesting that it not be supported, of course, as I said.

 If a user instead prefers to have the browser do the line breaks, they
> can always remove the newlines when they are converting from existing
> content to WebVTT and specify a "size" one the cues to determine at
> which width the line break should occur.
>

This isn't something that users should have to enable.  That's just
creating extra modes, which means more testing for every author and/or a
mode that will never work.  Users should never have to care about this.  If
it takes a non-default mode to get sane wrapping, that just means 99% of
users will never have sane wrapping.

The thing is: right now we are supporting both (automated line breaks
> and hard line breaks) in a simple manner. If we required <br>s for
> line breaks, that would bring extra overhead for no apparent advantage
> (at least none that I could directly point out).
>

Most critically, it encourages authors, especially those coming from SRT
and HTML, to not manually hand-wrap data.  If you have to say <br> to get a
line break, that's going to go a long way towards getting people to realize
that they're not supposed to be wrapping every line by hand.

With the current model, I'm think it's a very strong guarantee that a
majority of authors will always hand-wrap content.  Not because they're
following any particular style guide--just because they think it's the only
way to do it.

(It allowing authors to wrap long lines in their editor without causing
line breaks in captions, just as in HTML.)

> Note that SSA/ASS captions (the most common formats for fansubbing)
> usually
> > does use automatic word-wrapping.
>
> That's likely because their cues are specified on one line [1]. In
> order to force a new line, you have to insert {\N}, making the cue
> even less readable. I assume people would rather author another cue
> instead of doing this.
>

The point is that the SSA formats handle wrapping correctly.

I think there are good arguments for both positions: explicitly
> calling out newlines makes it clear to people where their cue text may
> be broken, but makes it harder to read.


When you're using manual line breaks "correctly"--that is, for the
occasional times when you really do legitimately need a line break--it
doesn't make it hard to read.

If you're pessimistically converting from SRT, you'd need to insert <br> on
each line, since you can't tell for sure in SRT whether a newline was
actually an important line break or not, but I don't think that's a
problem.  (It's not that hard to read, and who's sitting around, reading
the source code of automatically converted caption files?)


> I guess it depends on whether
> we can find a good enough "line balancing" algorithm that will provide
> for the quality of captions that people have come to expect [2].
>
> For example, the caption key clearly states that this is an
> inappropriate caption rendering:
> Mark pushed his black
> truck.
>
> While in contrast this is appropriate:
> Mark pushed
> his black truck.
>

This part is easy; the algorithm I suggested before handles it.  Basically,
take the regular word-wrapping algorithm, which results in the first
version.  Note the number of line breaks it results in: 1.  Then, insert
that number of line breaks evenly along the line, to result in the least
deviation in each lines' length.  (The last part would need to be more
explicit, of course, but I don't think it's difficult.)

Here are some of the rules it states:
> * Do not break a modifier from the word it modifies.
> * Do not break a prepositional phrase.
> * Do not break a person’s name nor a title from the name with which it
> is associated.
> * Do not break a line after a conjunction.
> * Do not break an auxiliary verb from the word it modifies.
>

These are exactly the sorts of things &nbsp; is for.  If you want to
carefully edit subtitles to follow these rules, then that just means using
it appropriately.  That's much saner than baking word wrapping into the
file.

(I've suggested supporting &nbsp; before--having to insert literal U+00A0
NO-BREAK SPACE into documents is essentially impossible to edit, without a
specialized editor, so I'll just reiterate that with the addition of the
above use cases.)

 * Never end a sentence and begin a new sentence on the same line
> unless they are short, related sentences containing one or two words.
>

<br> is fine here.

That - in my mind - is,
> however, a different issue to whether we introduce explicit markup for
> line breaks or not. I don't think we need the extra markup. I do think
> though that we need the extra line balancing algorithm.
>

A good balancing word-wrapping mode is a prerequisite for telling people
that they shouldn't break lines by hand.  I do agree that it seems
important, even on its own.

-- 
Glenn Maynard
Received on Tuesday, 10 April 2012 22:11:47 UTC