- From: Brendan Long <self@brendanlong.com>
- Date: Mon, 03 Nov 2014 17:50:28 -0600
- To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Cc: WHAT Working Group <whatwg@lists.whatwg.org>
On 11/03/2014 05:41 PM, Silvia Pfeiffer wrote: > On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long <self@brendanlong.com> wrote: >> On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote: >>> On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long <self@brendanlong.com> wrote: >>> Right, that was the original concern. But how realistic is the >>> situation of n video tracks and m caption tracks with n being larger >>> than 2 or 3 without a change of the audio track anyway? >> I think the situation gets confusing at N=2. See below. >> >>>> We would also need to consider: >>>> >>>> * How do you label this combined video and text track? >>> That's not specific to the approach that we pick and will always need >>> to be decided. Note that label isn't something that needs to be unique >>> to a track, so you could just use the same label for all burnt-in >>> video tracks and identify them to be different only in the language. >> But the video and the text track might both have their own label in the >> underlying media file. Presumably we'd want to preserve both. >> >>>> * What is the track's "id"? >>> This would need to be unique, but I think it will be easy to come up >>> with a scheme that works. Something like "video_[n]_[captiontrackid]" >>> could work. >> This sounds much more complicated and likely to cause problems for >> JavaScript developers than just indicating that a text track has cues >> that can't be represented in JavaScript. >> >>>> * How do you present this to users in a way that isn't confusing? >>> No different to presenting caption tracks. >> I think VideoTracks with kind=caption are confusing too, and we should >> avoid creating more situations where we need to do that. >> >> Even when we only have one video, it's confusing that captions could >> exist in multiple places. >> >>>> * What if the video track's kind isn't "main"? For example, what if we >>>> have a sign language track and we also want to display captions? >>>> What is the generated track's kind? >>> How would that work? Are you saying we're not displaying the main >>> video, but only displaying the sign language track? Is that realistic >>> and something anybody would actually do? >> It's possible, so the spec should handle it. Maybe it doesn't matter though? >> >>>> * The "language" attribute could also have conflicts. >>> How so? >> The underlying streams could have their own metadata, and it could >> conflict. I'm not sure if it would ever be reasonable to author a file >> like that, but it would be trivial to create. At the very least, we'd >> need language to say which takes precedence if the two streams have >> conflicting metadata. >> >>>> * I think it might also be possible to create files where the video >>>> track and text track are different lengths, so we'd need to figure >>>> out what to do when one of them ends. >>> The timeline of a video is well defined in the spec - I don't think we >>> need to do more than what is already defined. >> What I mean is that this could be confusing for users. Say I'm watching >> a video with two video streams (main camera angle, secondary camera >> angle) and two captions tracks (for sports for example). If I'm watching >> the secondary camera angle and looking at one of the captions tracks, >> but then the secondary camera angle goes away, my player is now forced >> to randomly select one of the caption tracks combined with the primary >> video, because it's not obvious which one corresponds with the captions >> I was reading before. >> >> In fact, if I was making a video player for my website where multiple >> people give commentary on baseball games with multiple camera angles, I >> would probably create my own controls that parse the video track ids and >> separates them back into video and text tracks so that I could have >> offer separate video and text controls, since combining them just makes >> the UI more complicated. > That's what I meant with multiple video tracks: if you have several > that require different captions, then you're in a world of hurt in any > case and this has nothing to do with whether you're representing the > non-cue-exposed caption tracks as UARendered or as a video track. I mean multiple video tracks that are valid for multiple caption tracks. The example I had in my head was sports commentary, with multiple people commenting on the same game, which is available from multiple camera angles. We probably do need a way to indicate that tracks go together when they don't all go together though. I think it's come up before. Maybe the obvious answer is, "don't have tracks that don't go together in the same file". >> So, what's the advantage of combining video and captions, rather than >> just indicating that a text track can't be represented as TextTrackCues? > One important advantage: there's no need to change the spec. > > If we change the spec, we still have to work through all the issues > that you listed above and find a solution. > > Silvia. I suppose not changing the spec is nice, but I think the changes are simpler if we have no-cue text tracks, since the answer to all of my questions becomes "we don't do that, we just keep the two tracks separate".
Received on Monday, 3 November 2014 23:50:55 UTC