W3C home > Mailing lists > Public > public-texttracks@w3.org > February 2015

Fwd: Response to call for public review of WebVTT FPWD

From: Andreas Tai <tai@irt.de>
Date: Tue, 17 Feb 2015 17:42:11 +0100
Message-ID: <54E36F63.6040704@irt.de>
To: public-texttracks@w3.org
CC: David Singer <singer@apple.com>
Sorry for cross posting, but I think that this is relevant for both groups.

Best regards,


-------- Weitergeleitete Nachricht --------
Betreff: 	Response to call for public review of WebVTT FPWD
Weitersenden-Datum: 	Tue, 17 Feb 2015 16:40:38 +0000
Weitersenden-Von: 	public-timed-text@w3.org
Datum: 	Tue, 17 Feb 2015 17:39:07 +0100
Von: 	Andreas Tai <tai@irt.de>
An: 	public-timed-text@w3.org
Kopie (CC): 	David Singer <singer@apple.com>

Dear all,

I am following the WebVTT spec for quite some time and wanted to respond
to the general call for public review. My comments are observations and
I hope they can be helpful for the WebVTT editor and the specification
group. They are not thought as change requests. It is in the hand of the
editor and /or the specification group to decide if any changes are
needed or possible.

I post this on the TTML mailing list and on the text track community
group list as subscribers may not have been merged yet.


One concept of the WebVTT spec is to cleanly separate the following areas:

- data model
- syntax
- parsing
- rendering


It is an interesting approach to provide different sections to different
target groups (e.g. WebVTT authors and WebVTT parser implementers) so
they do not have to read the complete spec. My experience is (after
reading different versions of WebVTT) that even for a specific task it
is difficult to get the necessary information without reading through
the complete spec.

If you are an author of WebVTT who wants to get the normative (!) text
how to write a timed subtitle in WebVTT that should appear at a specific
position at the bottom of the screen, where the text should have a
specific font size in relation to the video height and the text color of
the first line should be white and the text colour of the second line
shall be yellow than it is not sufficient to just read through the data
model and syntax sections. You have to read the rendering section which
also refers back to concepts of the parsing section.

If for the above task you want to get the information about a specific
presentation feature like positioning or writing direction you have to
extract from every section the different information. Often a part of a
section stand is dependent of other parts. You have to know the general
concepts that are outlined in a section (e.g. the concept of WebVTT
nodes in the parsing section). Also you presentation features depend on
each other (e.g. writing direction and positioning).

To re-assure you that you have authored a WebVTT file that will be
processed exactly as you want (based on normative text) and also if you
want to write a WebVTT compliant parser you most probably have to read
the complete spec.

In that case it would be a big help if the information about a "feature"
like positioning is all in one place (how this is represented in the
data model, how the syntax looks like, how it is parsed and at the

It is clear that how the spec is written already a re-organisation of
the spec text is difficult (e.g. the parsing section is one continuous
algorithm). But an additional normative section that brings all together
may be a useful guidance.

In general the informative text that was added in the later stage
editing process of the documents helps a lot. Sometimes I think this is
actually necessary normative text (e.g. the notes on positioning in
section 4.5).

Graphical representations would help a lot to understand the abstract
concepts. One example case is  positioning. The terms line position and
text position have been difficult for me to relate to the concepts they
represent. Pictures that visualize cue boxes the writing directions and
the positioning concepts would be great.

Although the syntax seems consistent to me it would be a great help in
addition to the prose there is a formal representation e.g. in Extended
Backus–Naur Form (EBNF). I remember that this was already proposed in
the community group.

In the following I list some observations to specific sections which
sometimes highlight as well more general issues.

Data Model
- Some concepts from the HTML spec are so vital to WebVTT that a short
summary would be of help (most importantly text track and text track cue)
- The title of 3.1 should possibly be "WebVTT cues" instead of Text
track cues. The first sentence in 3.1 indicate that WebVTT cues are
instances of text track cues and all the following applies to WebVTT cues.
- In the section the term "text track cue" is used which actually should
be WebVTT cue?
- writing direction
     * Maybe it would help if it is made more concrete how "vertical"
and "horizontal" relate to the rendering pane of the video?
     * Statements are made that apply to concepts that have not been
explained until this part of the spec. So it is explained how the
percentage of line position depends on the writing direction but the
concept of line position is explained further down. This also true for
other parts in the text.

     On the one hand linear reading is necessary because all normative
statements that have been made apply for all what follows. On the other
hand you have to read non-linear and jump between parts of the spec. You
may argue that this is the new form of reading but the mixture of the
concepts makes it difficult to get the complete picture. In this case
the statements on line position would fit better in the paragraph about
"line position".
- snap-to-lines flag
     * The snap-to-lines flag is of type Boolean that can be "set" to
"true" or "false". Instead of referring to these values sometimes the
setting of the values is described with the verbs "set" (for setting to
"true") and "unset" (for setting to "false"). It would be more
consistent and help the reader if the operation is always described as
"set to true" and "set to false".
- line position
     * The line position is actually more referring to the concept of a
cue box than to the concept of a line. The first sentence states "The
line position defines the position of the cue box.". It would be could
to have a term that describe this "feature" as an property of the cue
box instead of lines. As the syntax maybe hard to change it could help
if a synonymous word could be found. Furthermore a relationship to text
position could be used in addition. While text position "defines the
positioning of the cue box in the direction defined by the writing
direction", the line positon defines the position of the cue box
orthogonal to the direction defined by the writing direction (?).

- text position
   * There should be a link to the region part when the region is mentioned.
   * The text position depends on the text alignment (which is explained
further down). The next model element after text position is text
position alignment. If you read linear through the document you easily
confuse the two and refer the dependency to text postion alignment
instead to text alignment.
   * It would be clearer if it is stated explicitly that steps 2 to 4
apply when text position is not set explicitly.
- text alignment
     * Paragraph direction is used as a term but the concept is not
explained. start side of the cue box

   * The visual representation would help to get the difference between
region anchor point and region viewport anchor point.

   * In 4.1 a cue setting list, cue setting name and cue setting value
seem not to be constrained. But they are further constrained to values
by "4.5 WebVTT cue setting". A reference may be helpful.

Best regards,


Andreas Tai
Production Systems Television IRT - Institut fuer Rundfunktechnik GmbH
R&D Institute of ARD, ZDF, DRadio, ORF and SRG/SSR
Floriansmuehlstrasse 60, D-80939 Munich, Germany

Phone: +49 89 32399-389 | Fax: +49 89 32399-200
http: www.irt.de | Email: tai@irt.de

registration court&  managing director:
Munich Commercial, RegNo. B 5191
Dr. Klaus Illgner-Fehns

Andreas Tai
Production Systems Television IRT - Institut fuer Rundfunktechnik GmbH
R&D Institute of ARD, ZDF, DRadio, ORF and SRG/SSR
Floriansmuehlstrasse 60, D-80939 Munich, Germany

Phone: +49 89 32399-389 | Fax: +49 89 32399-200
http: www.irt.de | Email: tai@irt.de

registration court&  managing director:
Munich Commercial, RegNo. B 5191
Dr. Klaus Illgner-Fehns
Received on Tuesday, 17 February 2015 17:12:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:34:50 UTC