- From: Philip Jägenstedt <philipj@opera.com>
- Date: Wed, 21 Sep 2011 11:15:25 +0200
Implementors of <track> / WebVTT from several browser vendors (Opera, Mozilla, Google, Apple) met at the Open Video Conference recently. There was a session on video accessibility,[1] a bunch of new bugs were filed [2] and there was much rejoicing. There were a few issues that weren't concrete enough to file bugs on, but which I think are still worthwhile discussing further: == Comments == If you look at the source of the spec, you'll find comments as a v2 feature request: COMMENT --> this is a comment, bla bla I do not think this would be very useful. As a one-line comment at the top of the file (for authorship, etc) it is rather verbose and ugly, while for commenting out cues you would have to comment out each cue individually. It also doesn't work inside cues, where something like <! comment > is what would be backwards compatible with the current parser. If comments are left for v2, the above is what it'll be, because of compatibility constraints. If anyone is less than impressed with that, now would be the time to suggest an alternative and have it spec'd. == Scrolling captions == The WebVTT layout algorithm tries to not move cues around once they've been displayed and to never obscure other cues. This means that for cues that overlap in time, the rendering will often be out of order, with the earliest cue at the bottom. This is quite contrary to the (mainly US?) style of (live) scrolling captions, where cues are always in order and scroll to bring new captions into view. (I am not suggesting any specific change.) == Scaling up and down == Scaling the font size with the video will not be optimal for either small screens (text will be too small) or very large screens (text will be too big). Do we change the default rendering in some way, or do we let users override the font size? If users can override it, do we care that this may break the intended layout of the author? == Strict vs forgiving parsing == The parser is fairly strict in some regards: * double id line discards entire cue (http://www.w3.org/Bugs/Public/show_bug.cgi?id=13943) * must use exactly 2 digits for minutes and seconds * minutes and seconds must be <60 * must use "." as the decimal separator * must use exactly 3 decimal digits * stray "<" consumes the rest of the cue text A small percentage of cues (or cue text) will be dropped because of these constraints and this is not very likely to be noticed unless the entire video+captions are watched. Possible remedies: * make the parser more forgiving where it does not conflict with extensibility * make browsers complain a lot in the error console * point and laugh at those who failed to use a (non-existent) validator == Chapter end time == In most systems chapters are really chapter markers, a point in time. A chapter implicitly ends when the next begins. For nested chapters this isn't so, as the end time is used to determine nesting. Do we expect that UIs for chapter navigation make the end time visible in some fashion (e.g. highlighting the chapter on the timeline) or that when a chapter it is chosen, it will pause at the end time? == --> next == A suggestion that was brought up when discussing chapters. When one simply wants the chapter to end when the next starts, it's a bit of a hassle to always include the end time. Some additional complexity in the parser could allow for this: 00:00.000 --> next Chapter 1 01:00.000 --> next Intermezzo 02:00.000 --> next Last Chapter Cues would be created with endTime = Infinity, and be modified to the startTime of the following cue (in source order) if there is a following cue. This would IMO be quite neat, but is the use case strong enough? [1] http://openvideoconference.org/standards-for-video-accessibility/ [2] http://wiki.whatwg.org/wiki/WebVTT -- Philip J?genstedt Core Developer Opera Software
Received on Wednesday, 21 September 2011 02:15:25 UTC