- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Tue, 10 Apr 2012 14:03:04 +1000
- To: Glenn Maynard <glenn@zewt.org>
- Cc: Frank Olivier <Frank.Olivier@microsoft.com>, "public-texttracks@w3.org" <public-texttracks@w3.org>
[Changing topic to better reflect the discussion] On Fri, Apr 6, 2012 at 12:22 AM, Glenn Maynard <glenn@zewt.org> wrote: > On Thu, Apr 5, 2012 at 3:44 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> > wrote: >> >> I understand where you're coming from and some part of me agrees that >> line breaks should be left to the browser. >> >> However, there are rules on what quality captions should look like and >> how lines should be broken, see >> http://www.dcmp.org/captioningkey/text.html the line division section. >> That would not be possible unless you allow explicit lines breaks and >> make it easy to author them. I believe that is the reason why most >> captioning formats work with explicit line breaks. > > > I'm not suggesting prohibiting manual line breaks, of course; only making > them explicit, and as a side-effect of that, encouraging people to leave > wrapping up to the browser. > >> We have automated line wrapping for long lines. However, "long" is >> only defined as hitting the edge of the video element. If you have a >> better suggestion for when line breaking should kick in, I think that >> might be a good idea. > > > I'm not sure, exactly. Users probably have different preferences, so I'd > suggest leaving this up to browsers. (Since you can't precisely control > font rendering, sites can't depend on captions coming out a precise size on > all browsers anyway, so I don't think this reduces interop.) They're roughly the same, which is, I believe, sufficient for interop. >> > WebVTT text should mimic HTML (in its default whitespace mode): collapse >> > newlines to a space, and use a <br> marker to indicate explicit line >> > breaks >> > when they're really wanted. >> >> There would be too many <br>s since all captions are usually >> hand-crafted. When the video is increased in size, the captions are >> scaled up in font-size, so that works out. > > > That's a tendency that we need to discourage. "Hand-crafting" word-wrapping > is a fundamentally, inherently broken way to author content on the Web, > since (among other reasons) we don't prescribe font rendering. OK, but there is a large number of existing content that uses those hand-crafted newlines. I think they should continue to be supported. If a user instead prefers to have the browser do the line breaks, they can always remove the newlines when they are converting from existing content to WebVTT and specify a "size" one the cues to determine at which width the line break should occur. The thing is: right now we are supporting both (automated line breaks and hard line breaks) in a simple manner. If we required <br>s for line breaks, that would bring extra overhead for no apparent advantage (at least none that I could directly point out). > Note that SSA/ASS captions (the most common formats for fansubbing) usually > does use automatic word-wrapping. That's likely because their cues are specified on one line [1]. In order to force a new line, you have to insert {\N}, making the cue even less readable. I assume people would rather author another cue instead of doing this. [1] http://docs.aegisub.org/manual/ASS_Tags > When the font being used to render captions is larger than the font the > author used, it can easily result in lines no longer fitting, which results > in captions meant to render like this: > >> word word word word word word word word word word word<br> >> word word word word word word > > (<br> being the author's manual break) ending up looking like this: > >> word word word word word word word word word word >> word<br> >> word word word word word word > > This isn't theoretical. I've seen this artifact in the real world many > times (probably with SRT). I agree, I've seen this before, too. It is particularly awful when all of this is actually one sentence. It's actually fine when they are two sentences, e.g. Mary: This is the very long sentence that I wanted to say. Paul: Ah, ok, this is my reply. It would be bad if this was indeed mashed together. > (Please don't say that users can't be allowed to choose their own minimum > font sizes. That's a fundamental accessibility feature. I always set a > minimum font size in my browser, because web pages often use font sizes too > small for my comfort. That needs to apply to captions, just as with other > web content.) > > This can happen if the line is longer than expected for any other reason, > too. Different font engines will result in different renderings; different > fonts will be used due to font replacement when the font selected isn't > available; even the same font can render differently in different versions > of a font, and so on. Content that expects a particular font rendering is > broken, whether it's an HTML document or a caption, and we should do what we > can to minimize that sort of content. Currently, the format *encourages* > it, which is very bad. > > (As a final note, even when people really want to manually wrap captions, I > disagree that it results in too many <br>s. There's no significant harm in > that--certainly none that outweighs the benefits--and it only affects > badly-authored captions anyway. Anyhow, the only case I can see where > people might legitimately--for some value of "legitimate"--be manually > word-wrapping is when converting from other formats, in which case it > doesn't matter if there are lots of <br>s.) I think there are good arguments for both positions: explicitly calling out newlines makes it clear to people where their cue text may be broken, but makes it harder to read. I guess it depends on whether we can find a good enough "line balancing" algorithm that will provide for the quality of captions that people have come to expect [2]. For example, the caption key clearly states that this is an inappropriate caption rendering: Mark pushed his black truck. While in contrast this is appropriate: Mark pushed his black truck. Here are some of the rules it states: * Do not break a modifier from the word it modifies. * Do not break a prepositional phrase. * Do not break a person’s name nor a title from the name with which it is associated. * Do not break a line after a conjunction. * Do not break an auxiliary verb from the word it modifies. * Never end a sentence and begin a new sentence on the same line unless they are short, related sentences containing one or two words. I do not believe we currently have a CSS line-break algorithm that supports these. Until that happens, caption providers will continue to use hard newlines to make sure that they meet these requirements for the 90% rendering case. [2] http://www.dcmp.org/captioningkey/text.html >> > A "balanced" word-wrapping mode should also be >> > added, to wrap lines in with balanced line-lengths, which is the typical >> > wrapping method for captions. >> >> How do you suggest that should look? > > > Basically, instead of using paragraph-style wrapping, which wraps (roughly > speaking) at the latest opportunity per line: > > word word word word word word word word word word > word word word word word word word word word word > word word > > it adjusts the breaks to attempt to make each line a similar length: > > word word word word word word word word > word word word word word word word word > word word word word word word > > It would never use a greater number of line breaks than in the regular > wrapping mode. Above, two line breaks are used, and balanced wrapping would > never increase that to three in an attempt to balance more evently. It > would only move the breaks around. > > (This would be a CSS feature that WebVTT would use, not a WebVTT-specific > feature. I think Ian at least sounded open to the idea when I talked to him > about it last.) Considering all the requirements that I listed above for quality captions, I doubt we will be able to introduce a CSS line break algorithm that will allow us to meet all of the requirements with a fully automated algorithm, as much as I would love to. It would, however, be good if we could at least tell CSS to use a better balancing algorithm than the existing one. That - in my mind - is, however, a different issue to whether we introduce explicit markup for line breaks or not. I don't think we need the extra markup. I do think though that we need the extra line balancing algorithm. Cheers, Silvia.
Received on Tuesday, 10 April 2012 04:03:54 UTC