Re: metadata in the VTT file header, re-starting the conversation from Glenn Maynard on 2012-05-28 (public-texttracks@w3.org from May 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 28 May 2012 12:32:50 -0500
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Philip Jägenstedt <philipj@opera.com>, public-texttracks@w3.org
Message-ID: <CABirCh82MheNW5daYwGih8UYNme8HNBZyJf6LtR58RfOD7KQbQ@mail.gmail.com>
On Sun, May 13, 2012 at 5:36 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

>  There is a fixed location in a WebVTT file where the data is and there
> is a fixed location in WebM files where the data is. I don't follow
> your argument. For 10 tracks, of course you have to seek to 10
> different locations - each one has different metadata. They can't be
> mingled together.
>

Sure they can.  Extract the headers and group them in the same region at
the beginning of the WebM file, so you can read all of them in a single
burst.  This is standard practice for any file meant to be read from media
with costly seeks (eg. optical media and networks).  I'm pretty sure WebM
already does this, at least for certain bits of WebVTT metadata.

This is exactly the same as putting VTT metadata in <video>.  They're all
"mingled together", grouped together into a single DOM element, so you
don't have to load every VTT file to find out what each one's for.

>> I don't know where you get your statistics, but almost all usage of
> >> SRT files on a desktop work the #3 way and they all fall short of the
> >> metadata problem, which is something we don't want to repeat with
> >> WebVTT. The rest of the desktop use cases (in particular MPEG-4 and
> >> QuickTime files) have it muxed in-band, i.e. the #2 case. We're
> >> introducing #1 because it's the Web way, but it's a new way and by far
> >> not the most common way yet.
> >
> > Almost all uses of SRT and SSA I've seen in many years are embedded in
> MKV
> > files (#2); media players get subtitle metadata from the MKV structure.
> > Loose SRT files are rare these days.
>
> In the world of MKV users you might be right. I doubt that's the
> majority use case for captions though.
>

I'm not talking about *all* uses of captions, just delivering savable
static video files with subtitles (that's what "usage of SRT files on a
desktop" sounded like--not sure what you meant if not that).

No it won't. The VTT file is being used in the way it is supposed to
> be used as prepared by the VTT author. The Web publisher has overruled
> those hints for *their* Website. That doesn't mean that now the hints
> in the VTT file are incorrect.
>

If the VTT says it's in French but the captions are actually in German, and
it's been overridden by HTML, then of course the VTT file is incorrect.

 I really don't see why a VTT author would get the metadata value of
> @kind wrong - they are the ones who create the files and know exactly
> what they create them for: captions, subtitles, descriptions, chapters
> or even metadata.
>

All most people care about is if the file does what they expect it to do.
If a captions VTT file is being treated as captions, they won't notice (or
care) if there's a "Kind: subtitles" line in the VTT file (being overridden
by the HTML).

Anyway, we're getting a bit afield.  Let's back up a bit and reexamine the
use cases.

1: Allowing WebM muxing software to automatically detect the metadata, so
users don't have to do it manually (like they have to today for MKV/SRT
muxing).
2: Allowing software to automatically generate complete HTML <video>
snippets.  (This is basically the same as #1.)
3: Allowing video players to display captions without needing to know how
to parse HTML.

This works fine for #1 and #2.  It really doesn't help #3 at all--even if
you assume VTT metadata will never be out of date (a tough assumption), a
huge number of VTT files will simply not have it.  That means no player can
ever really depend on the metadata being in each VTT file.

But I don't think #3 is a real use case anyway.  If a site like YouTube (a
very hypothetical example :) wants to allow saving videos to disk with
captions, it should mux them into a WebM or MKV file and present *that* for
download, not dump a dozen separate VTT files on the user and expect him to
keep them together.  That is, standalone video players probably don't need
to support reading loose VTT files *or* HTML parsing--their MKV/WebM
support is enough.

> WebM already lets you store files with associated filenames (used for
> > fonts), so it seems natural to just say eg:
> >
> > Stylesheet: file.css
> >
> > and depending on how the VTT file was loaded, it'll either treat it as a
> > relative URL and fetch the file (if it was loaded via HTTP), look in the
> > same directory (if it was loaded as a loose file), or look for a WebM
> > attachment with that name (the common case for standalone files).
>
> Ah nice. I wasn't aware of that. Is that a WebM feature or a MKV feature?
>

I don't know--if it's in MKV but not WebM, I'd argue that it should be
added to WebM.  Putting aside CSS, this is how embedded fonts are handled
in MKV, and WebM should be consistent with that.

-- 
Glenn Maynard
Received on Monday, 28 May 2012 17:33:39 UTC