- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Tue, 25 Oct 2011 07:50:43 +1100
So, in your opinion, should there be a change to the WebVTT spec that separates cues differently? Is there a recommendation you have from your analysis? Cheers, Silvia. On Mon, Oct 24, 2011 at 6:26 PM, Simon Pieters <simonp at opera.com> wrote: > I wanted to research how common it is to fail to separate cues in SRT, and > for what reason. > > SRT parsers usually interpret a timings line as a new cue, while WebVTT > wants two blank lines for a new cue. > > I took the 65k SRT files we've got, replaced comma with dot and prepended > "WEBVTT\n\n", then ran them in Opera's <track> impl, looking for '-->' in > cue data. > > There were 840 files with --> in cue data. This is 1.3% of the files. > > Looking at the cue data, there were 11,118 lines that contained -->. There > were 8830 lines of only whitespace. > > In the cue data, if I look at valid-looking timing lines > (/^\d\d:\d\d:\d\d\.\d\d\d\s*-->\s*\d\d:\d\d:\d\d\.\d\d\d(\s|$)/) and check > the line before that, or the line before *that* if it looks like an SRT id > (/^\d+\s*$/), then I see 7030 lines of only whitespace and 3761 lines of > something else. > > Failing to separate cues results in an unpleasant experience for the user, > since basically the screen is filled with several "cues" including their IDs > and timing lines. > > Some files had most or all of their cues parsed as a single cue with the > WebVTT parser, e.g. because all lines ended with one or more spaces. Looking > at such a file in a text editor, it's not immediately obvious that there's > an error, because the spaces are not visible. Moreover, the file is not > non-conforming, so a validator wouldn't help either. > > So what about the cases that aren't whitespace? It seems to be mostly just > missing the newline completely. Some omitted the ID also. One file had a "|" > between all cues. > > -- > Simon Pieters > Opera Software >
Received on Monday, 24 October 2011 13:50:43 UTC