Re: A new proposal for how to deal with text track cues

On Wed, Jun 12, 2013 at 8:41 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> Might I ask you to take your time to read the proposal properly and
> provide me with technical feedback. I would highly appreciated it.
>

I plan to do so, but there is one thing we need to recognize, and it
impacts what functionality is to be exposed via a common cue interface, and
that is the disparate goals, at least in emphasis, of the two formats we
need to consider: TTML and VTT.

(1) separation of function between authoring system and presentation system;

TTML places the primary burden on the authoring system to make presentation
decisions, such as region placement, whether regions can overlap (or not),
whether regions can extend outside the related media region (or not),
content alignment, etc. In contrast, VTT appears to place this burden on
the presentation system (browser), and assumes the author does not or
should not become involved in these presentation choices.

(2) tool authored versus hand authored content;

TTML places an emphasis on the use of authoring systems to produce and
interchange valid content documents that will have a long shelf life and
potentially permanent (embedded) binding with related media. In contrast,
VTT places an emphasis on the use of simple (vi/emacs) plain text authoring
utilities which arbitrary end users can employ to easily create hand
authored caption content, often not using any validation tools, and relying
upon parsing and presentation semantics suited for possibly non well formed
or invalid content.

Both of these above differences in goals and emphasis serve to create
differences between the two formats, in syntactic formalisms, in semantic
processing requirements on the client device, and so on.

As we move forward to find common ground for improved interoperability, we
need to recognize that both sets of goals and their emphases are legitimate
design and deployment choices for the different stake holders: authoring
system vendors, timed text authors, browser implementers, and end users,
all of whom need to benefit from these services.

It would be wrong to insist that only one choice is or should be made among
the spectrum of possibilities that obtain in this area, just as arguing
that only one choice of video or audio format is or should be made.

So, with regards to the TT cue interface(s), we need to decide if:

   - it is desirable to define a single common, cue interface that can
   serve the semantic needs of both TTML and VTT
   - or, whether two distinct interfaces should be created;
   - or, some combination of the above (i.e., a common base and separate
   sub-interfaces)

>From what I can tell, most folks seem to have the third option in mind:
create a common base to the extent that is feasible, then create format
specific sub-interfaces that address non-common features.

At this point, I believe it would be wrong to characterize TTML and VTT as
simply two different serialization formats of the same feature set. That
just isn't true, and I don't know if it will ever be true given the
differences in goals and emphasis I cite above.  That leaves us with
needing to identify a common semantic subset of features both formats
support and then design a common cue interface on that basis.

As we go through that process, we may find it convenient to make changes to
one or both formats over time in order to better support semantics that are
only supported in the other format.

I believe the TTWG has general consensus on undertaking such a process. And
that brings us back to the process question of where this work will take
place. It is fine to discuss strawmans on this ML and the TTCG or TTWG ML,
but we need a definite home for this work where consensus can be
established and decisions taken.

G.

Received on Wednesday, 12 June 2013 15:02:23 UTC