Re: metadata in the VTT file header, re-starting the conversation from Glenn Maynard on 2012-05-13 (public-texttracks@w3.org from May 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Sun, 13 May 2012 11:44:14 -0500
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Philip Jägenstedt <philipj@opera.com>, public-texttracks@w3.org
Message-ID: <CABirCh-6o=WhMxffM6jPi=MwrehAhmCJdkzo3jqv8jxk6roXKw@mail.gmail.com>
On Wed, May 9, 2012 at 7:27 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> >  If the
> > information has to be loaded out of each .VTT file, it could require a
> lot
> > of seeking around the file to load it; slow on optical media, even if it
> > happens to be stored in the same file.
>
> We're only talking about header-style metadata. There is no seeking
> around required: it comes straight after the WEBVTT magic string.
>

Reading data at the beginning of the file is exactly what requires seeking
around.  If you're loading metadata for ten VTT files embedded in a WebM
file, you have to seek to the location of each embedded file to read it.
That's why formats like WebM store metadata like that in a single index
that can be loaded all at once.

 > Mirroring that information only seems important for #3.  That case is
> > uncommon, but it does happen.  I can't decide if the problem I mention
> below
> > is worth the relative infrequency of this use case...
>
> I don't know where you get your statistics, but almost all usage of
> SRT files on a desktop work the #3 way and they all fall short of the
> metadata problem, which is something we don't want to repeat with
> WebVTT. The rest of the desktop use cases (in particular MPEG-4 and
> QuickTime files) have it muxed in-band, i.e. the #2 case. We're
> introducing #1 because it's the Web way, but it's a new way and by far
> not the most common way yet.
>

Almost all uses of SRT and SSA I've seen in many years are embedded in MKV
files (#2); media players get subtitle metadata from the MKV structure.
Loose SRT files are rare these days.

I don't regard that as a problem, but as an opportunity. The file
> itself has one set of metadata. That's data that the Web Dev can
> decide to use. Or instead they can decide to overrule it with specific
> directions in the <track> element.
>

That'll break if anyone tries to use the data in anything that doesn't
parse HTML and gets metadata from VTT files.

My prediction is that putting metadata like "kind" and "language" in the
VTT file will never be done consistently anyway; most people will put it in
the HTML, see that it works in browsers, and not bother to put the data in
the VTT file too.  That's fine for the Web, but it means standalone players
won't be able to rely on it.  Handing the information to muxers seems like
the main use case this would actually work reliably for.

I would most likely create the attributes of a <track> element by
> analysing the content of the WebVTT files that I am serving and just
> hand that data through. In this way the browser gets all the
> information that it needs out of the WebVTT file without actually
> having to download and parse anything from the WebVTT file. It's
> proxied information, not redundant information.
>

It's easiest to think of it as a caching mechanism.  The only reason to put
the data in both the HTML file *and* the VTT file is because it's faster to
read it all at once out of the HTML file; the HTML data effectively becomes
a cache of the metadata stored in the VTT files.

So long as the cache is consistent, everything's fine.  It's just
unfortunate when the cache gets out of sync (eg. people update one and not
the other).


>  That's like saying you can't trust any information given to you in files.
>
In the end, you have to be able to rely on some data: either you rely
> on the Web dev doing the correct thing or you rely on the WebVTT
> author doing the right thing. Who can you rely on more? If done
> properly, the Web dev will just use what's in the file, and the WebVTT
> author will be the one making sure the file is correct.
>

When the duplication doesn't exist to begin with, you don't have to worry
about either author doing the right thing; there's only one thing they
*can* do, since the data is in only one place.

I'm not calling this a fatal problem, but if we're consciously introducing
a new category of problem to solve other problems, we should be aware of
the tradeoff.

We haven't figured out how to deal with external CSS and WebVTT for
> non-browser apps either. The WebM mechanism will simply rely on
> whatever we come up with. If it's independent files that have to be
> delivered with the media and the WebVTT file (maybe in a zip file),
> then that works for WebM. I'm wary of putting a file name into WebVTT
> - I'd much rather leave it informally to be delievered in zip files
> with same names. In-line css in WebVTT headers would also work for
> WebM.
>

WebM already lets you store files with associated filenames (used for
fonts), so it seems natural to just say eg:

Stylesheet: file.css

and depending on how the VTT file was loaded, it'll either treat it as a
relative URL and fetch the file (if it was loaded via HTTP), look in the
same directory (if it was loaded as a loose file), or look for a WebM
attachment with that name (the common case for standalone files).

> It also says: "This is how roll-up captions work: multiple cues are
> rendered
> > simultaneously, and when the top cue expires, the other cues move up and
> a
> > new cue appears at the bottom."  I don't know why it says that, since
> WebVTT
> > doesn't do roll-up captions.
>
> Such a shame, isn't it! Just look at:
> http://www.youtube.com/watch?v=oxkZTF-7Lgw - how will we do that with
> WebVTT?
>

I hope we won't.  Word-at-a-time is the worst possible presentation mode
for captions.  Watching that, I never get to look at the video; I have to
stare at the captions the whole time--I may as well be reading a
transcript.  After the video was no longer realtime, the incremental
captions should have been flattened into individual captions that can be
shown all at once.  Ian suggested the same thing:
http://lists.w3.org/Archives/Public/public-texttracks/2011Dec/0033.html("for
rebroadcast...").

 >> metadata is stored in CodecPrivate etc.
> >
> > (It doesn't look like that's what it's currently suggesting, FYI: "no
> WebVTT
> > data is stored in the CodecPrivate element of the WebM Track header".
>  It's
> > a wiki, so maybe it changed since you read it last.)
>
> You're mis-reading. This refers to storing no payload data (i.e. no
> CUES) into the CodecPrivate header.
>

WebVTT headers will also be WebVTT data.  The above text should say "WebVTT
cues".

-- 
Glenn Maynard
Received on Sunday, 13 May 2012 16:44:44 UTC