- From: Simon Pieters <simonp@opera.com>
- Date: Tue, 25 Oct 2011 09:18:32 +0200
On Mon, 24 Oct 2011 22:50:43 +0200, Silvia Pfeiffer <silviapfeiffer1 at gmail.com> wrote: > So, in your opinion, should there be a change to the WebVTT spec that > separates cues differently? > Is there a recommendation you have from your analysis? My recommendation is http://www.w3.org/Bugs/Public/show_bug.cgi?id=14550 > Cheers, > Silvia. > > On Mon, Oct 24, 2011 at 6:26 PM, Simon Pieters <simonp at opera.com> wrote: >> I wanted to research how common it is to fail to separate cues in SRT, >> and >> for what reason. >> >> SRT parsers usually interpret a timings line as a new cue, while WebVTT >> wants two blank lines for a new cue. >> >> I took the 65k SRT files we've got, replaced comma with dot and >> prepended >> "WEBVTT\n\n", then ran them in Opera's <track> impl, looking for '-->' >> in >> cue data. >> >> There were 840 files with --> in cue data. This is 1.3% of the files. >> >> Looking at the cue data, there were 11,118 lines that contained -->. >> There >> were 8830 lines of only whitespace. >> >> In the cue data, if I look at valid-looking timing lines >> (/^\d\d:\d\d:\d\d\.\d\d\d\s*-->\s*\d\d:\d\d:\d\d\.\d\d\d(\s|$)/) and >> check >> the line before that, or the line before *that* if it looks like an SRT >> id >> (/^\d+\s*$/), then I see 7030 lines of only whitespace and 3761 lines of >> something else. >> >> Failing to separate cues results in an unpleasant experience for the >> user, >> since basically the screen is filled with several "cues" including >> their IDs >> and timing lines. >> >> Some files had most or all of their cues parsed as a single cue with the >> WebVTT parser, e.g. because all lines ended with one or more spaces. >> Looking >> at such a file in a text editor, it's not immediately obvious that >> there's >> an error, because the spaces are not visible. Moreover, the file is not >> non-conforming, so a validator wouldn't help either. >> >> So what about the cases that aren't whitespace? It seems to be mostly >> just >> missing the newline completely. Some omitted the ID also. One file had >> a "|" >> between all cues. >> >> -- >> Simon Pieters >> Opera Software >> -- Simon Pieters Opera Software
Received on Tuesday, 25 October 2011 00:18:32 UTC