Re: metadata in the VTT file header, re-starting the conversation from Silvia Pfeiffer on 2012-05-10 (public-texttracks@w3.org from May 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 10 May 2012 10:27:49 +1000
To: Glenn Maynard <glenn@zewt.org>
Cc: Philip Jägenstedt <philipj@opera.com>, public-texttracks@w3.org
Message-ID: <CAHp8n2=PJLMV17_xkCn0MZMu9AGSxOiVm7VTt0hrkJV2eG9+XQ@mail.gmail.com>
(Getting back to this discussion that I had lost track of ..)

On Fri, Feb 24, 2012 at 11:23 AM, Glenn Maynard <glenn@zewt.org> wrote:
> Three different usage scenarios are:
>
> 1: .VTT tracks defined in HTML.
> 2: .VTT tracks embedded in a container like WebM.
> 3: Loose .VTT tracks, in a directory alongside a video.

Agreed.


> I don't think the types of metadata you're describing (mirroring <track>)
> are necessarily important for #2, since WebM, etc. should define a way to
> embed that on its own (as you mention they're working on).

They are, but they are looking for use case #3 to provide what needs
to be embedded into the video. That is indeed the most common way that
data is presented to an muxing program: the video file plus the
individual tracks with all the information that is required for the
encapsulation. Sometimes you can overwrite the information given in
the track with command-line parameters. But never have I heard of a
muxing program that reads a HTML file to get its metadata.


>  If the
> information has to be loaded out of each .VTT file, it could require a lot
> of seeking around the file to load it; slow on optical media, even if it
> happens to be stored in the same file.

We're only talking about header-style metadata. There is no seeking
around required: it comes straight after the WEBVTT magic string.


> Mirroring that information only seems important for #3.  That case is
> uncommon, but it does happen.  I can't decide if the problem I mention below
> is worth the relative infrequency of this use case...

I don't know where you get your statistics, but almost all usage of
SRT files on a desktop work the #3 way and they all fall short of the
metadata problem, which is something we don't want to repeat with
WebVTT. The rest of the desktop use cases (in particular MPEG-4 and
QuickTime files) have it muxed in-band, i.e. the #2 case. We're
introducing #1 because it's the Web way, but it's a new way and by far
not the most common way yet.


> I suppose it might also be convenient for authoring, eg. so extracting a
> .VTT from a WebM file can include the metadata inline instead of having to
> somehow output an HTML stub, and so WebM muxing tools don't have to be able
> to parse HTML to read the metadata to be stored in the output file.

Agreed.


> On Thu, Feb 23, 2012 at 5:07 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>>
>> They have to react differently to the data in the cues depending on
>> whether it is a caption/subtitle, a description, a chapter or a
>> metadata file. So this information is vital to have.
>>
>> Also, the information as to what language the track is in would be
>> very important to display in the list of available caption tracks. For
>> example, VLC currently loads all the SRT tracks for a video that are
>> in the same directory, but only displays them as "track1", "track2",
>> "track3", etc. which is pretty useless from a UI POV. Instead, if
>> there was a normative location to describe the language, VLC could
>> display that language.
>
>
> My biggest concern is that metadata in <video> is guaranteed to be out of
> sync with metadata in the .VTT header in many files, and many people won't
> set it at all.  They'll never notice a problem, since it'll work fine for
> them in browsers, which will use the <track> information.
>
> I'm nervous about introducing data redundancy that we know for sure will
> lead to inconsistencies...

I don't regard that as a problem, but as an opportunity. The file
itself has one set of metadata. That's data that the Web Dev can
decide to use. Or instead they can decide to overrule it with specific
directions in the <track> element.

For example, if I had a Web Server that is managing a collection of
WebVTT files, I would most likely have the WebVTT files managed and
created by somebody who has nothing to do with the Website. I'd either
make sure the file I am given have the right metadata inside them, or
if I don't trust the files I'd ignore the metadata.

I would most likely create the attributes of a <track> element by
analysing the content of the WebVTT files that I am serving and just
hand that data through. In this way the browser gets all the
information that it needs out of the WebVTT file without actually
having to download and parse anything from the WebVTT file. It's
proxied information, not redundant information.


> Maybe if WebM muxers/demuxers and other tools depend on these headers
> (instead of reading HTML <video> snippets or something similarly annoying),
> it'll help encourage people to use it properly, but it still seems like a
> losing battle.

That's like saying you can't trust any information given to you in files.
In the end, you have to be able to rely on some data: either you rely
on the Web dev doing the correct thing or you rely on the WebVTT
author doing the right thing. Who can you rely on more? If done
properly, the Web dev will just use what's in the file, and the WebVTT
author will be the one making sure the file is correct.



>> > I'll have to read up on the WebM metadata thread soon, because I don't
>> > see
>> > why it would be dependent on the format WebVTT uses.
>>
>> It's here:
>> http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm
>
>
> This doesn't mention how to deal with external CSS files and fonts.  I don't
> know if that's implicitly defined by existing WebM mechanisms or just
> something they haven't figured out yet.

We haven't figured out how to deal with external CSS and WebVTT for
non-browser apps either. The WebM mechanism will simply rely on
whatever we come up with. If it's independent files that have to be
delivered with the media and the WebVTT file (maybe in a zip file),
then that works for WebM. I'm wary of putting a file name into WebVTT
- I'd much rather leave it informally to be delievered in zip files
with same names. In-line css in WebVTT headers would also work for
WebM.


> It also says: "This is how roll-up captions work: multiple cues are rendered
> simultaneously, and when the top cue expires, the other cues move up and a
> new cue appears at the bottom."  I don't know why it says that, since WebVTT
> doesn't do roll-up captions.

Such a shame, isn't it! Just look at:
http://www.youtube.com/watch?v=oxkZTF-7Lgw - how will we do that with
WebVTT?


> (I don't have the bandwidth to join WebM lists to ask about these things, so
> I'd just ask anyone involved in those discussions who thinks any of this is
> worth mentioning to do so.)

No worries. I can be the proxy.

>> metadata is stored in CodecPrivate etc.
>
> (It doesn't look like that's what it's currently suggesting, FYI: "no WebVTT
> data is stored in the CodecPrivate element of the WebM Track header".  It's
> a wiki, so maybe it changed since you read it last.)

You're mis-reading. This refers to storing no payload data (i.e. no
CUES) into the CodecPrivate header.

Further down it says:

"File-wide metadata does not have a timestamp, so all the text (up to
and excluding the linefeed separator that demarcates the file-wide
metadata and the first cue) could be stored in the CodecPrivate
sub-element of the Track element."


Regards,
Silvia.
Received on Thursday, 10 May 2012 00:28:40 UTC