- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Tue, 4 Nov 2014 10:41:01 +1100
- To: Brendan Long <self@brendanlong.com>
- Cc: WHAT Working Group <whatwg@lists.whatwg.org>
On Tue, Nov 4, 2014 at 10:24 AM, Brendan Long <self@brendanlong.com> wrote: > > On 11/03/2014 04:20 PM, Silvia Pfeiffer wrote: >> On Tue, Nov 4, 2014 at 3:56 AM, Brendan Long <self@brendanlong.com> wrote: >> Right, that was the original concern. But how realistic is the >> situation of n video tracks and m caption tracks with n being larger >> than 2 or 3 without a change of the audio track anyway? > I think the situation gets confusing at N=2. See below. > >>> We would also need to consider: >>> >>> * How do you label this combined video and text track? >> That's not specific to the approach that we pick and will always need >> to be decided. Note that label isn't something that needs to be unique >> to a track, so you could just use the same label for all burnt-in >> video tracks and identify them to be different only in the language. > But the video and the text track might both have their own label in the > underlying media file. Presumably we'd want to preserve both. > >>> * What is the track's "id"? >> This would need to be unique, but I think it will be easy to come up >> with a scheme that works. Something like "video_[n]_[captiontrackid]" >> could work. > This sounds much more complicated and likely to cause problems for > JavaScript developers than just indicating that a text track has cues > that can't be represented in JavaScript. > >>> * How do you present this to users in a way that isn't confusing? >> No different to presenting caption tracks. > I think VideoTracks with kind=caption are confusing too, and we should > avoid creating more situations where we need to do that. > > Even when we only have one video, it's confusing that captions could > exist in multiple places. > >>> * What if the video track's kind isn't "main"? For example, what if we >>> have a sign language track and we also want to display captions? >>> What is the generated track's kind? >> How would that work? Are you saying we're not displaying the main >> video, but only displaying the sign language track? Is that realistic >> and something anybody would actually do? > It's possible, so the spec should handle it. Maybe it doesn't matter though? > >>> * The "language" attribute could also have conflicts. >> How so? > The underlying streams could have their own metadata, and it could > conflict. I'm not sure if it would ever be reasonable to author a file > like that, but it would be trivial to create. At the very least, we'd > need language to say which takes precedence if the two streams have > conflicting metadata. > >>> * I think it might also be possible to create files where the video >>> track and text track are different lengths, so we'd need to figure >>> out what to do when one of them ends. >> The timeline of a video is well defined in the spec - I don't think we >> need to do more than what is already defined. > What I mean is that this could be confusing for users. Say I'm watching > a video with two video streams (main camera angle, secondary camera > angle) and two captions tracks (for sports for example). If I'm watching > the secondary camera angle and looking at one of the captions tracks, > but then the secondary camera angle goes away, my player is now forced > to randomly select one of the caption tracks combined with the primary > video, because it's not obvious which one corresponds with the captions > I was reading before. > > In fact, if I was making a video player for my website where multiple > people give commentary on baseball games with multiple camera angles, I > would probably create my own controls that parse the video track ids and > separates them back into video and text tracks so that I could have > offer separate video and text controls, since combining them just makes > the UI more complicated. That's what I meant with multiple video tracks: if you have several that require different captions, then you're in a world of hurt in any case and this has nothing to do with whether you're representing the non-cue-exposed caption tracks as UARendered or as a video track. > So, what's the advantage of combining video and captions, rather than > just indicating that a text track can't be represented as TextTrackCues? One important advantage: there's no need to change the spec. If we change the spec, we still have to work through all the issues that you listed above and find a solution. Silvia.
Received on Monday, 3 November 2014 23:41:45 UTC