- From: Philip Jägenstedt <philipj@opera.com>
- Date: Sat, 25 Jan 2014 23:47:19 +0700
- To: Brendan Long <B.Long@cablelabs.com>
- Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>
On Sat, Jan 25, 2014 at 12:15 AM, Brendan Long <B.Long@cablelabs.com> wrote: > On Fri, 2014-01-24 at 23:33 +0700, Philip Jägenstedt wrote: > > Having looked at the original thread, I can only guess that you don't > want to involve scripts, since if you can rely on scripts it seems > like you could easily do what you're asking for. What is the reason > that you do not want to use scripts here? > > First, a philosophical reason: Requiring JavaScript to play a live video > with captions seems like a huge hack. Technically, we could decode videos in > JavaScript too, but that doesn't mean it's a good solution. > > Second, a practical reason: If we can produce a valid WebVTT document, then > any web page can display it with a normal video tag. If we have to use > JavaScript, then inevitably there will be several different ways of doing > it, and any page that wants to use live captions from outside sources will > need a list of JavaScript hacks to make them all work. Then, anytime a site > that produces live captions changes its method, anything that depends on it > will break until they update their JavaScript.. It just seems like "live > video" is a normal enough case we shouldn't push all of this complexity on > every site that plays them. It's always difficult to decide which features warrant a declarative solution and which should be left to scripts. I'm trying to understand out what the costs to either approach are. It seems to me that when you do live streaming, you're going to be using Media Source Extensions, which require rather a lot of JavaScript. In that context the script required to update the end times of cues when the next cue comes in doesn't seem much of a burden. In other words the cost of *not* solving this declaratively doesn't seem very high. (I've probably misunderstood some part of the use case, in particular "live captions from outside sources" seem mysterious to me.) So, what about the cost in solving this declaratively? 1. Is the special keyword NEXT for the end time the only new syntax that's required? 2. When should the end time of a NEXTy cue be updated? Is it when a new cue with a higher start time is parsed, or should e.g. a script modifying the start time of an existing cue also do something? 3. Should the endTime IDL attribute actually be modified, or should it simply be that a cue with end time NEXT is not considered active if there are any cues with a later end time? 4. What happens when you have two cues with the same start time that both have end time NEXT? Depending on how this is supposed to work, it will be add more or less complexity to the spec and implementations. Philip
Received on Saturday, 25 January 2014 16:47:48 UTC