- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Wed, 11 Apr 2012 14:32:45 +1000
- To: Glenn Maynard <glenn@zewt.org>
- Cc: David Singer <singer@apple.com>, Gal Klein <gal@plymedia.com>, public-texttracks@w3.org
On Wed, Apr 11, 2012 at 1:45 PM, Glenn Maynard <glenn@zewt.org> wrote: > On Tue, Apr 10, 2012 at 8:22 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> > wrote: >> >> I am here and being paid by Google's accessibility team to make sure >> this use case is supported for our YouTube captions, because we need >> it. I've listed our use cases. What more do you want me to say? > > > The use cases are disputed, but I'm trying to avoid replying to them just > because it's mostly all been discussed already. I would like to hear back on the other thread on this - in particular since Shane seems to support your position. I know that we had several customers on YouTube requiring the feature. > The only argument I think > might have merit is the claim that lots of people want it, but the data for > that--the claim that around 50% of people want roll-ups--needs examination. > (Subtitles and captions on every DVD and Blu-ray are pop-on; SRT, SSA and > ASS subtitles are all pop-on; and I've spent a good deal of time talking > about subtitled media--and yet I've never once heard anybody going "I wish > these subtitles were roll-up, it's so much easier to read". That's why I, > at least, will take some convincing to believe there's really significant > demand for it from users.) We need to be able to have rollup functionality just for time-overlapping cues. Surely it's obvious that the current display mechanism of time-overlapping cues in WebVTT is very unusual (i.e. using whichever space is available next to currently rendered cue text, whether above or below). Surely we can also agree that many people will prefer a rollup display over the current rendering mechanism for time-overlapping cues. I am willing to concede that it may be 50% or less, but substantially more than 20%, so it meets the 80% use case. I am also willing to agree that sometimes we may prefer to display roll-down rather than roll-up. >> What about explicitly positioned cues, e.g. underneath a certain >> person and the desire to have captions scrolling there? > > > This sounds like a third mode, separate from both roll-up and pop-on > captions. I don't see it that way. Rollup and rolldown are a means of modifying already rendered cues no matter where they are rendered. The initial rendering position and the transition mode are orthogonal concepts. We don't need to mingle them into a new mode. > I'm not sure how (or if) they'd fit together. It raises a huge > new set of questions. (What happens if the subject is moving around the > frame? The caption rendering box moves with them. > What if two subjects with active captions cross each other in the > frame? The caption rendering boxes move with them. > Does this mean three user options--pop-on, roll-up and > follows-the-speaker? No. We already have "follow-the-speaker" on TV - the captions are just repeated in a different location to do that. This is not practical or desirable on the Web and we can find a better way of rendering this. The existing types as "pop-on", "roll-up" and "paint-on" aren't really well separated types of captions and I wouldn't want to continue using them as technical specification of rendering means. "Pop-on" is more than just rendering on screen: it is usually implied that there is no time-overlap with other cues. So, what are pop-on captions with time-overlap (and more so: region-overlap)? That's a type of captions that does not fit in any of these classes. "Roll-up" similarly usually implies live captions with their delays and errors, which are not a type of rendering at all, but authoring deficiencies. Also, rollup usually implies successive revealing of two characters each (or at least SCC is defined that way). They also typically have the ">>" characters at the start. So, what are roll-up captions that are accurately timed, shown word by word in sync with the audio when these words are spoken, but do not use the ">>" symbols? That's again a type of captions that doesn't fit in any of the three classes. "Paint-on" similarly usually implies revealing a word at a time for a single piece of text about the length of a pop-on caption, but then the text disappears and a new block is rendered by "revealing", i.e. there are no time-overlapping cues and therefore no text needs moving since the position of each word is clear for the cue.. So, what are captions that are successively revealed but time-overlap? Again, this is a type of captions that doesn't fit in any of the three classes. Making up new classes doesn't help, in particular when the classes overlap in functionality. We need to identify the different individual features that the classes are made up of and turn those into WebVTT features. The features of pop-on are the following: * render a piece of text on screen * in a given location * within a specified rendering area All of these features are available with current WebVTT. The features or roll-up are the following: * render a limited number of lines of text on screen (usually 2-4) * in a given location * within a specified rendering area * in a time-overlapping manner * where within the rendering area lines of text always make space for a new text line underneath existing text * and text that scrolls out of the top of the rendering area disappears. The last two features are not currently supported in WebVTT. > Can this be done in a way that will consistently work > for all user preferences, eg. if a user wants roll-up and it's authored for > follow?) I'm afraid of the discussion spiraling out of control if we start > considering this... I'm sure we can and that's what all this discussion is working towards. We need to be open to the opportunities of the Web and allow all feature combinations, not just a limited set like the one that was defined for TV. We don't have to mash them all up, however - instead we need to regard them as separate dimensions that all have an impact on how things are rendered on screen. Cheers, Silvia.
Received on Wednesday, 11 April 2012 04:33:34 UTC