Response to call for public review of WebVTT FPWD

Dear all,

I am following the WebVTT spec for quite some time and wanted to respond 
to the general call for public review. My comments are observations and 
I hope they can be helpful for the WebVTT editor and the specification 
group. They are not thought as change requests. It is in the hand of the 
editor and /or the specification group to decide if any changes are 
needed or possible.

I post this on the TTML mailing list and on the text track community 
group list as subscribers may not have been merged yet.

----------------------------------------------------------------------------

One concept of the WebVTT spec is to cleanly separate the following areas:

- data model
- syntax
- parsing
- rendering
- API

GENERAL OBSERVATIONS

It is an interesting approach to provide different sections to different 
target groups (e.g. WebVTT authors and WebVTT parser implementers) so 
they do not have to read the complete spec. My experience is (after 
reading different versions of WebVTT) that even for a specific task it 
is difficult to get the necessary information without reading through 
the complete spec.

If you are an author of WebVTT who wants to get the normative (!) text 
how to write a timed subtitle in WebVTT that should appear at a specific 
position at the bottom of the screen, where the text should have a 
specific font size in relation to the video height and the text color of 
the first line should be white and the text colour of the second line 
shall be yellow than it is not sufficient to just read through the data 
model and syntax sections. You have to read the rendering section which 
also refers back to concepts of the parsing section.

If for the above task you want to get the information about a specific 
presentation feature like positioning or writing direction you have to 
extract from every section the different information. Often a part of a 
section stand is dependent of other parts. You have to know the general 
concepts that are outlined in a section (e.g. the concept of WebVTT 
nodes in the parsing section). Also you presentation features depend on 
each other (e.g. writing direction and positioning).

To re-assure you that you have authored a WebVTT file that will be 
processed exactly as you want (based on normative text) and also if you 
want to write a WebVTT compliant parser you most probably have to read 
the complete spec.

In that case it would be a big help if the information about a "feature" 
like positioning is all in one place (how this is represented in the 
data model, how the syntax looks like, how it is parsed and at the 
rendered).

It is clear that how the spec is written already a re-organisation of 
the spec text is difficult (e.g. the parsing section is one continuous 
algorithm). But an additional normative section that brings all together 
may be a useful guidance.

In general the informative text that was added in the later stage 
editing process of the documents helps a lot. Sometimes I think this is 
actually necessary normative text (e.g. the notes on positioning in 
section 4.5).

Graphical representations would help a lot to understand the abstract 
concepts. One example case is  positioning. The terms line position and 
text position have been difficult for me to relate to the concepts they 
represent. Pictures that visualize cue boxes the writing directions and 
the positioning concepts would be great.

Although the syntax seems consistent to me it would be a great help in 
addition to the prose there is a formal representation e.g. in Extended 
Backus–Naur Form (EBNF). I remember that this was already proposed in 
the community group.

In the following I list some observations to specific sections which 
sometimes highlight as well more general issues.

-----------------------
Data Model
-----------------------
- Some concepts from the HTML spec are so vital to WebVTT that a short 
summary would be of help (most importantly text track and text track cue)
- The title of 3.1 should possibly be "WebVTT cues" instead of Text 
track cues. The first sentence in 3.1 indicate that WebVTT cues are 
instances of text track cues and all the following applies to WebVTT cues.
- In the section the term "text track cue" is used which actually should 
be WebVTT cue?
- writing direction
     * Maybe it would help if it is made more concrete how "vertical" 
and "horizontal" relate to the rendering pane of the video?
     * Statements are made that apply to concepts that have not been 
explained until this part of the spec. So it is explained how the 
percentage of line position depends on the writing direction but the 
concept of line position is explained further down. This also true for 
other parts in the text.

     On the one hand linear reading is necessary because all normative 
statements that have been made apply for all what follows. On the other 
hand you have to read non-linear and jump between parts of the spec. You 
may argue that this is the new form of reading but the mixture of the 
concepts makes it difficult to get the complete picture. In this case 
the statements on line position would fit better in the paragraph about 
"line position".
- snap-to-lines flag
     * The snap-to-lines flag is of type Boolean that can be "set" to 
"true" or "false". Instead of referring to these values sometimes the 
setting of the values is described with the verbs "set" (for setting to 
"true") and "unset" (for setting to "false"). It would be more 
consistent and help the reader if the operation is always described as 
"set to true" and "set to false".
- line position
     * The line position is actually more referring to the concept of a 
cue box than to the concept of a line. The first sentence states "The 
line position defines the position of the cue box.". It would be could 
to have a term that describe this "feature" as an property of the cue 
box instead of lines. As the syntax maybe hard to change it could help 
if a synonymous word could be found. Furthermore a relationship to text 
position could be used in addition. While text position "defines the 
positioning of the cue box in the direction defined by the writing 
direction", the line positon defines the position of the cue box 
orthogonal to the direction defined by the writing direction (?).

- text position
   * There should be a link to the region part when the region is mentioned.
   * The text position depends on the text alignment (which is explained 
further down). The next model element after text position is text 
position alignment. If you read linear through the document you easily 
confuse the two and refer the dependency to text postion alignment 
instead to text alignment.
   * It would be clearer if it is stated explicitly that steps 2 to 4 
apply when text position is not set explicitly.
- text alignment
     * Paragraph direction is used as a term but the concept is not 
explained. start side of the cue box

Regions
   * The visual representation would help to get the difference between 
region anchor point and region viewport anchor point.

Syntax
   * In 4.1 a cue setting list, cue setting name and cue setting value 
seem not to be constrained. But they are further constrained to values 
by "4.5 WebVTT cue setting". A reference may be helpful.


Best regards,

Andreas

-- 
------------------------------------------------
Andreas Tai
Production Systems Television IRT - Institut fuer Rundfunktechnik GmbH
R&D Institute of ARD, ZDF, DRadio, ORF and SRG/SSR
Floriansmuehlstrasse 60, D-80939 Munich, Germany

Phone: +49 89 32399-389 | Fax: +49 89 32399-200
http: www.irt.de | Email: tai@irt.de
------------------------------------------------

registration court&  managing director:
Munich Commercial, RegNo. B 5191
Dr. Klaus Illgner-Fehns
------------------------------------------------

Received on Tuesday, 17 February 2015 16:40:37 UTC