Re: metadata in the VTT file header, re-starting the conversation from Silvia Pfeiffer on 2012-06-08 (public-texttracks@w3.org from June 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 8 Jun 2012 12:44:44 +1000
To: Glenn Maynard <glenn@zewt.org>
Cc: Philip Jägenstedt <philipj@opera.com>, public-texttracks@w3.org
Message-ID: <CAHp8n2mZdmWisJCA46LMZFWAedwmHqA1tcb0vfsbf54zR+_J5Q@mail.gmail.com>
On Fri, Jun 8, 2012 at 12:10 PM, Glenn Maynard <glenn@zewt.org> wrote:
> On Thu, Jun 7, 2012 at 7:54 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>>
>> >> There is a fixed location in a WebVTT file where the data is and there
>> >> is a fixed location in WebM files where the data is. I don't follow
>> >> your argument. For 10 tracks, of course you have to seek to 10
>> >> different locations - each one has different metadata. They can't be
>> >> mingled together.
>> >
>> >
>> > Sure they can.  Extract the headers and group them in the same region at
>> > the
>> > beginning of the WebM file, so you can read all of them in a single
>> > burst.
>>
>> Physically they would indeed all be in the same region. But logically
>> they still have to be separate. That's all I was referring to.
>
>
> (You said that 10 tracks would require 10 seeks.  I'm just saying that it
> doesn't: group the metadata together on the media, and you can read all 10
> tracks' metadata with 1 seek.)

(Even if they are placed together and you can in theory get to them
with 1 seek - it depends on how you parse your file. If you parse it
one track at a time, you will still get 10 seeks, because they are
logically different entities. Just saying... not that it makes a
difference to the below argument.)


>> I think we can also agree that there is a use
>> case where video files and VTT files are handles as separate resources
>> - in particular by a video player that is not a Web browser and does
>> not combine the presentation through a HTML page with its track
>> elements. My argument is simply that the most appropriate place to
>> keep metadata about VTT files is at the beginning of a VTT file in
>> this latter use case.
>
>
> Okay, what I disagree with is with the idea of distributing files to
> end-users as loose files.

I guess we're going to have to agree to disagree on this use case.


>  While it's probably harmless to allow players to
> use the VTT headers for that, they couldn't depend on it, since lots of
> files won't have them.
>
> I agree that putting <track> metadata inside the VTT is useful for the
> editing/authoring phase.  That is, to allow implementing WebM/MKV muxers and
> autogenerating <track> text, without the user having to supply this metadata
> (as you have to today, for example, when muxing an MKV file from a video
> file and an .SRT, or hand-typing <video>).  I do think that these use cases
> are enough to justify the feature.  (That means the rest of this is mostly
> tangental.)

At least we agree that we need that feature. It's ok then that we
disagree on how it will be used. :-)


>> Indeed. VTT authors will want to check their VTT files offline, too,
>> not just online. So they will use a player such as VLC. If they get
>> the language and/or the kind wrong, VLC would show just as well as a
>> Web page shows a Web author what they got wrong in their HTML markup.
>> I don't really see the difference.
>
>
> Some will, of course, but you have a much higher opinion of how much people
> test content than I do if you think this will be common.  Even if they
> notice it, many will probably ignore it, knowing that it'll be fixed later
> when they set up the <track> (which is what happens with the SRT/MKV
> process).

If they distribute the files as separate files, they will make sure
the markup is right. Since we've agreed to disagree on this an
appropriate use case, we can stop arguing about what mistakes authors
make, too, I guess. ;-)


>> > If a captions VTT file is being treated as captions, they won't notice
>> > (or
>> > care) if there's a "Kind: subtitles" line in the VTT file (being
>> > overridden
>> > by the HTML).
>> >
>> > Anyway, we're getting a bit afield.  Let's back up a bit and reexamine
>> > the
>> > use cases.
>> >
>> > 1: Allowing WebM muxing software to automatically detect the metadata,
>> > so
>> > users don't have to do it manually (like they have to today for MKV/SRT
>> > muxing).
>> > 2: Allowing software to automatically generate complete HTML <video>
>> > snippets.  (This is basically the same as #1.)
>> > 3: Allowing video players to display captions without needing to know
>> > how to
>> > parse HTML.
>> >
>> > This works fine for #1 and #2.  It really doesn't help #3 at all--even
>> > if
>> > you assume VTT metadata will never be out of date (a tough assumption),
>> > a
>> > huge number of VTT files will simply not have it.  That means no player
>> > can
>> > ever really depend on the metadata being in each VTT file.
>>
>> So what? A Web page cannot rely on @kind and @srclang being available
>> for a text track either. There are default settings that players &
>> browsers will use. I don't see how this is different.
>
>
> It's completely different.  People will be naturally encouraged to supply
> @srclang, because the problems that happen if they don't ("Language:
> unknown") are obvious and will show up immediately.  People won't supply VTT
> headers most of the time, because it won't cause any evident problems.
> They'll have no idea that they've omitted anything.

Their desktop player will show them the same error: "Language:
unknown". I don't see a difference.


>> > But I don't think #3 is a real use case anyway.  If a site like YouTube
>> > (a
>> > very hypothetical example :) wants to allow saving videos to disk with
>> > captions, it should mux them into a WebM or MKV file and present *that*
>> > for
>> > download, not dump a dozen separate VTT files on the user and expect him
>> > to
>> > keep them together.  That is, standalone video players probably don't
>> > need
>> > to support reading loose VTT files *or* HTML parsing--their MKV/WebM
>> > support
>> > is enough.
>>
>> This is wishful thinking. I've tried for over 10 years to get
>> stand-along caption files included into binary media resources. While
>> it sometimes happens, it doesn't in the majority of cases.
>
>
> It's not wishful thinking.  It's experience from watching subtitled video
> online for as long as video has been online.  Standalone caption files are
> rare, while I've seen thousands of video files with embedded captions.
> (Before MKV became popular, standalone SRT files were a little more common,
> but baking subtitles into the video was the common practice.)

Apparently you have your experiences and I have mine. I simply accept
both use cases as valid.


>> The reasons are simple: text files are editable - video files not so much.
>> Keeping
>> them separate gives you more control. The problem of shipping around
>> multiple related files has been solved by zip - it is a smaller
>> problem than encapsulating/extracting text files into/from binary
>> files.
>
>
> I've never once seen anyone distributing video for end-user consumption in a
> ZIP.  (I'd fight tooth and nail against anybody encouraging that, but I've
> elided it for the sake of not starting another tangent...)

Actually, I also hope we won't see zip used for video distribution. I
would, however, expect people to distribute a video file and a zip
file with all the relevant text tracks zipped together. That's two
files rather than n files. Also, I would expect muxed files, but a lot
less so than you seem to expect.

Anyway, there are other use cases for separate files: for example you
may want to buy/download your video off one site and the captions off
another site.


>> The only place where it will happen is where publishers want to
>> take away the control over the caption files from their users and thus
>> enforce the distribution of the video files with encapsulated text
>> tracks (mainly as an obfuscation mechanism to make it harder for the
>> ordinary user to change them).
>
>
> It happens all the time, and not to try to take away control.  Muxing isn't
> DRM.

No it's not, but it's easier to put DRM on muxed files than on plain text files.


>> As for YouTube: right now you can download the text track files and
>> the video files separately (some hacking involved), but there is no
>> muxed download. That seems to contradict your example.
>
>
> I was talking about presenting videos to the user for download, not manually
> downloading the underlying resources that the Flash player accesses.  That's
> no different than downloading the individual resources pointed to by <track>
> elements..  YouTube doesn't present downloads at all, unless this has
> changed recently or is very well hidden.

It was at one stage. I doubt they will do it again - with or without
muxed files. If they were, I would indeed hope it to be muxed. I
would, however, not expect browsers to implement automated muxing from
video+track markup (it would be nice, but I'm not hopeful, because it
creates a different content to the one that was published). So, when
downloading (right click "save video as") you will continue to get
individual files for the foreseeable future.


TL;DR: we should probably move on to say: no matter our motivation, we
agree that we need metadata in vtt files. Let's design the solution.
Agreed?

Cheers,
Silvia.
Received on Friday, 8 June 2012 02:45:33 UTC