Metadata in the VTT file header (bug 15851), use cases (and a need to close this) from David Singer on 2012-08-29 (public-texttracks@w3.org from August 2012)

From: David Singer <singer@apple.com>
Date: Wed, 29 Aug 2012 15:21:51 -0700
To: public-texttracks <public-texttracks@w3.org>
Message-id: <9AB32F18-EC49-4026-8FF9-CD7501B9C2BA@apple.com>
see <https://www.w3.org/Bugs/Public/show_bug.cgi?id=15851>


We agreed on a design (see below), but Ian is stalled waiting for use cases.  Here are the ones I have gleaned. Please, if others have use-cases or details to add, pitch in.

I am aware implementations done or in process, and I previously mistakenly thought this was settled.  We really need to see it documented (yes, we need at least single-line values in v1).

Use-cases are touched on briefly in the bug, but here are more, in more (gory) detail.  

1) Authoring.  Quite often caption files are authored/written in a different workflow from the media, and must be re-united later. We'd like to keep track of attributes of the files in-band, so that they don't get lost (e.g. the language of the captions), and indeed, of the proposed values for the <track> element attributes when the file is referenced from HTML. It can also be useful to include a link-back to the content that was captioned, using an identifier (e.g. URL).

2) Use in other embeddings.  MPEG has started work on specifying MP4 carriage of WebVTT in a track of the MP4 file. In this context, we need some of the attributes that are carried in the HTML layer.  Some are already covered or partially covered (e.g. all tracks can carry a language in MP4) but not all.  WebM embedding is also under way.

3) Side-band use in other contexts. In some delivery scenarios, it makes sense for WebVTT caption files not be embedded but carried in a 'side-band' (e.g. in HTTP streaming systems), that is, loaded as a side-file. In this case, we need the ability to carry attributes that the referencing file does not carry.

4) Style-sheets.  Maybe it's satisfactory to define that WebVTT inherits styling from its container (e.g. HTML5), but in the case where the container doesn't carry styling (e.g. HTTP streaming, MP4), or in the case where specific styling is needed for the WebVTT, we need to be able to reference or include style sheets in the WebVTT layer itself. As an example, a style-sheet giving 608/708 appearance is being worked on as part of the 608/708 conversion.

5) Time alignment. When WebVTT is used as the caption source for a system where timestamps are from an arbitrary origin (e.g. a continuous MPEG-2 Transport stream) we need a way to say that 'timestamp X in this VTT file aligns with Timestamp Y in the media stream' so as to get synchronization.  This is naturally put into the header.

* * * * *

The current state:

As Ian suggests in <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-December/034026.html> (search for 'file-level would'), we settled in email on a syntax that allows for key-value pairs, colon separated, with an option for multi-line values that mimics SMTP.  Silvia summarizes it <http://lists.w3.org/Archives/Public/public-texttracks/2012Jun/0011.html>.  Here it is, all pulled together.

Single-line value example:

key: value

(as Ian suggested above).

Multi-line values open with a single vertical bar, close with a line containing a single period, and value lines that (a) would otherwise be empty or (b) consist of a single period or (c) already start with a backslash, are escaped with a preceding backslash. The line-breaks (i) after the vertical bar and (ii) before the closing period are NOT part of the value.

multi-line value example

key: |
valueline
valueline
…
.

Note that therefore the following two express exactly the same key-value relationship:

key: value
key: |
value
.

and that if someone wants a value that is a single vertical bar (why? but it is good to make sure we have no pathological cases), you can do it:

key: |
|
.


All key-names NOT starting with X- are reserved to the specification.  The key names that are the same as the <track> element attributes are defined, with the same semantics.  For use-case (4) we probably need key names that identify an out-of-band or in-band stylesheet.  I don't think we have current proposals, but it's probably time to have them.  (Silvia has more example names in <http://dvcs.w3.org/hg/text-tracks/raw-file/fe5cd9afb9c7/608toVTT/608toVTT.html#metadata-xds> but she forgot we agreed to ":" not "=".)


If a value conflicts with the value specified by a container (eg. <track> or WebM track data), the container's value takes precedence (but please don't do this).


We have implementation of the single-line case in hand and would prefer it not change; we can probably be more flexible on the multi-line case, but the above seems fine to us and covers all the cases while retaining simplicity.  (Previous proposals used [[ and ]] instead of | and ., but, whatever.)

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Wednesday, 29 August 2012 22:22:20 UTC