Re: Response to call for public review of WebVTT FPWD from Andreas Tai on 2015-03-22 (public-texttracks@w3.org from March 2015)

From: Andreas Tai <tai@irt.de>
Date: Sun, 22 Mar 2015 18:39:26 +0100
To: silviapfeiffer1@gmail.com
CC: public-texttracks@w3.org, David Singer <singer@apple.com>
Message-ID: <550EFE4E.5010004@irt.de>
Hi Silvia,

it took a bit longer to go through the comments than expected. Apologies 
for that. Find my feedback inline.

Am 24.02.2015 um 18:48 schrieb Andreas Tai:
> Hi Silvia,
>
> Thanks for all the detailed feedback on my comments. This makes it 
> really encouraging to comment on this spec!!
>
> I need a bit of time to go through your feedback and reply to it but 
> will certainly do!
>
> Best regards,
>
> Andreas
>
> Am 22.02.2015 um 11:46 schrieb Silvia Pfeiffer:
>> Hi Andreas,
>>
>> Thanks for all your feedback and sorry for the late reply - I'll give 
>> you some responses inline.
>>
>>
>>
>>
>>     -------- Weitergeleitete Nachricht --------
>>     Betreff:  Response to call for public review of WebVTT FPWD
>>     Weitersenden-Datum:  Tue, 17 Feb 2015 16:40:38 +0000
>>     Weitersenden-Von:  public-timed-text@w3.org
>>     <mailto:public-timed-text@w3.org>
>>     Datum:  Tue, 17 Feb 2015 17:39:07 +0100
>>     Von:  Andreas Tai <tai@irt.de> <mailto:tai@irt.de>
>>     An:  public-timed-text@w3.org <mailto:public-timed-text@w3.org>
>>     Kopie (CC):  David Singer <singer@apple.com>
>>     <mailto:singer@apple.com>
>>
>>
>>
>>     Dear all,
>>
>>     I am following the WebVTT spec for quite some time and wanted to respond
>>     to the general call for public review. My comments are observations and
>>     I hope they can be helpful for the WebVTT editor and the specification
>>     group. They are not thought as change requests. It is in the hand of the
>>     editor and /or the specification group to decide if any changes are
>>     needed or possible.
>>

>>     GENERAL OBSERVATIONS
>>
>>     It is an interesting approach to provide different sections to different
>>     target groups (e.g. WebVTT authors and WebVTT parser implementers) so
>>     they do not have to read the complete spec. My experience is (after
>>     reading different versions of WebVTT) that even for a specific task it
>>     is difficult to get the necessary information without reading through
>>     the complete spec.
>>
>>     If you are an author of WebVTT who wants to get the normative (!) text
>>     how to write a timed subtitle in WebVTT that should appear at a specific
>>     position at the bottom of the screen, where the text should have a
>>     specific font size in relation to the video height and the text color of
>>     the first line should be white and the text colour of the second line
>>     shall be yellow than it is not sufficient to just read through the data
>>     model and syntax sections. You have to read the rendering section which
>>     also refers back to concepts of the parsing section.
>>
>>
>> You shouldn't need to read the rendering section, but you are right. 
>> You will need to read the CSS extensions section for the color 
>> changes only thought. Would it help to make the syntax of the CSS 
>> extensions a separate section?
>>

Not sure about the concrete solution. An author should get a hint where 
to look if we he wants to know how to set display properties like font 
or color properties. It may not been obvious for a non webvtt expert 
which display properties are defined by CSS and which are defined by 
webvtt syntax. It is as well important to have easy access to document 
parts where the default properties for color and fonts are specified. 
These are quite important for captions.

>>     If for the above task you want to get the information about a specific
>>     presentation feature like positioning or writing direction you have to
>>     extract from every section the different information. Often a part of a
>>     section stand is dependent of other parts. You have to know the general
>>     concepts that are outlined in a section (e.g. the concept of WebVTT
>>     nodes in the parsing section). Also you presentation features depend on
>>     each other (e.g. writing direction and positioning).
>>
>>
>> Positioning and writing direction should be sufficiently specified in 
>> the syntax. Of course, the syntax section is not a complete authoring 
>> guide - we have 
>> https://docs.webplatform.org/wiki/concepts/VTT_Captioning and other 
>> articles or tutorials on the Web for that.
>>
Yes, these kind of authoring guides are very helpful. Maybe you can link 
directly from the spec to resources like that above?

>> Also, why would you need to understand the concept of WebVTT nodes in 
>> the syntax section? I don't follow. Can you explain?
>>

If you want to apply CSS than this is bound to the concept of WebVTT 
nodes (if I am not mistaken).

>>     To re-assure you that you have authored a WebVTT file that will be
>>     processed exactly as you want (based on normative text) and also if you
>>     want to write a WebVTT compliant parser you most probably have to read
>>     the complete spec.
>>
>> There's a validator at https://quuz.org/webvtt/ that will help write 
>> valid WebVTT files.
>> If you want to write a parser, yes, you will need to read more than 
>> the syntax - also the parser section.

Yes, Anne von Kesteren's tool is very helpful. But as author you also 
want to know how your content get´s rendered.

>>     In that case it would be a big help if the information about a "feature"
>>     like positioning is all in one place (how this is represented in the
>>     data model, how the syntax looks like, how it is parsed and at the
>>     rendered).
>>
>> That doesn't work, because features are interdepentent. What you are 
>> after is an authoring guide like 
>> https://docs.webplatform.org/wiki/concepts/VTT_Captioning .

Yes, I think a lot of things can be delegated to one or more author 
guides and possibly this is a good way out. Spec text and guides have to 
be "synced" then.

>> This spec follows the modern approach of writing W3C specs that that 
>> UI implementers are able to implement interoperably.
>>

True and this is very interesting. In general Specs are often not easy 
to read and it would be interesting to have a discussion which is the 
best approach. This certainly goes beyond WebVTT but the question of 
"spec usability" is essential for market adoption.

>>     It is clear that how the spec is written already a re-organisation of
>>     the spec text is difficult (e.g. the parsing section is one continuous
>>     algorithm). But an additional normative section that brings all together
>>     may be a useful guidance.
>>
>>
>> We cannot specify something twice normatively - that causes 
>> contraditions.
>>

Understand!

>>     In general the informative text that was added in the later stage
>>     editing process of the documents helps a lot. Sometimes I think this is
>>     actually necessary normative text (e.g. the notes on positioning in
>>     section 4.5).
>>
>>
>> We can help by adding more such descriptive text. It can, however, 
>> only be informative. If you have any concrete suggestions on what is 
>> under-described, please add a bug at 
>> https://www.w3.org/Bugs/Public/enter_bug.cgi?product=TextTracks%20CG .
>>

Yes, I think more informative sections are helpful.

>>     Graphical representations would help a lot to understand the abstract
>>     concepts. One example case is  positioning. The terms line position and
>>     text position have been difficult for me to relate to the concepts they
>>     represent. Pictures that visualize cue boxes the writing directions and
>>     the positioning concepts would be great.
>>
>>
>> I agree - we should add some more visual examples. Do register some 
>> bugs so we won't forget about it.
>>

Will do so!

>>     Although the syntax seems consistent to me it would be a great help in
>>     addition to the prose there is a formal representation e.g. in Extended
>>     Backus–Naur Form (EBNF). I remember that this was already proposed in
>>     the community group.
>>
>>
>> Can you propose an EBNF that covers the features set? I doubt it's 
>> possible without making too many simplifications.
>>

This is an interesting task but lack of time may stop me to give it a 
try. This maybe a good topic for a students work. (The same issue is now 
registered in https://www.w3.org/Bugs/Public/show_bug.cgi?id=28258 by 
Addison Phillips from W3C I18N group)



>>     In the following I list some observations to specific sections which
>>     sometimes highlight as well more general issues.
>>
>>     -----------------------
>>     Data Model
>>     -----------------------
>>     - Some concepts from the HTML spec are so vital to WebVTT that a short
>>     summary would be of help (most importantly text track and text track cue)
>>
>>
>> This is a problem that several W3C specs share. I'm waiting for the 
>> ReSpec authoring tools to make it possible to pull in text from 
>> another spec without having to re-type the text (because that would 
>> cause inconsistencies).
>>
  Makes sense.


>>     - The title of 3.1 should possibly be "WebVTT cues" instead of Text
>>     track cues. The first sentence in 3.1 indicate that WebVTT cues are
>>     instances of text track cues and all the following applies to WebVTT cues.
>>
>>
>> Bug registered: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28070
>>

Thanks for opening the bug, discussing the issue thoroughly and 
resolving it!

>>     - In the section the term "text track cue" is used which actually should
>>     be WebVTT cue?
>>
>>
>> Also: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28070
>>

See above.

>>     - writing direction
>>          * Maybe it would help if it is made more concrete how "vertical"
>>     and "horizontal" relate to the rendering pane of the video?
>>
>>
>> How can you misunderstand horizontal and vertical? It's pretty well 
>> defined within a Web page and more generally, too.

Perhaps this is more theoretical. I brought this up because vtt may not 
always integrated in an HTML page. For some use cases it gets just 
delivered together with the video. But if everybody get´s it right no 
further action maybe needed.

>>          * Statements are made that apply to concepts that have not been
>>     explained until this part of the spec. So it is explained how the
>>     percentage of line position depends on the writing direction but the
>>     concept of line position is explained further down. This also true for
>>     other parts in the text.
>>
>>
>> OK, the statements about line position could be moved down to line 
>> position.
>> Bug registered: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28071

Thanks for opening and discussing this. Added a comment in the tracker.

>>          On the one hand linear reading is necessary because all normative
>>     statements that have been made apply for all what follows. On the other
>>     hand you have to read non-linear and jump between parts of the spec. You
>>     may argue that this is the new form of reading but the mixture of the
>>     concepts makes it difficult to get the complete picture.
>>
>>
>> Yes, we're trying to avoid using concepts that have been defined 
>> later where possible.

Good : )

>>
>>     - snap-to-lines flag
>>          * The snap-to-lines flag is of type Boolean that can be "set" to
>>     "true" or "false". Instead of referring to these values sometimes the
>>     setting of the values is described with the verbs "set" (for setting to
>>     "true") and "unset" (for setting to "false"). It would be more
>>     consistent and help the reader if the operation is always described as
>>     "set to true" and "set to false".
>>
>>
>> OK. Bug registered: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28072
Thanks for opening the bug. After the discussion on the tracker there 
seems no problem to leave it like it is.

>>
>>     - line position
>>          * The line position is actually more referring to the concept of a
>>     cue box than to the concept of a line. The first sentence states "The
>>     line position defines the position of the cue box.". It would be could
>>     to have a term that describe this "feature" as an property of the cue
>>     box instead of lines. As the syntax maybe hard to change it could help
>>     if a synonymous word could be found.
>>
>> I've actually thought about this a lot and haven't come up with a 
>> better word. Maybe "cue position"?
>> It's "line position" for now for historic reasons, because it was 
>> started that way.
>> Bug registered: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28073
>>
Thanks for opening an issue. Commented on the bug in the bug tracker.

>>     - text position
>>        * There should be a link to the region part when the region is mentioned.
>>
>>
>> Bug registered: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28074
>>

Thanks for resolving this.

>>        * The text position depends on the text alignment (which is explained
>>     further down). The next model element after text position is text
>>     position alignment. If you read linear through the document you easily
>>     confuse the two and refer the dependency to text postion alignment
>>     instead to text alignment.
>>
>> Yeah, text position should also relate to cue box. So, should we talk 
>> about horizontal and vertical cue box position rather than line and 
>> text position? Problem with that is that we would need to rename the 
>> cue settings, which people have now become used to. Really not sure 
>> how to resolve this. If you have a good suggestion, please register a 
>> bug.
>>

Yes, this would not be backward compatible. Will open a bug when I have 
a concrete change proposal.

>>        * It would be clearer if it is stated explicitly that steps 2 to 4
>>     apply when text position is not set explicitly.
>>
>>
>> That's what the text in 1. already says: "Otherwise, the text track 
>> cue text position 
>> <http://dev.w3.org/html5/webvtt/#dfn-text-track-cue-text-position> is 
>> the special value auto 
>> <http://dev.w3.org/html5/webvtt/#dfn-text-track-cue-automatic-text-position>." 
>> .
>>
This was just a minor observation so no problem for me to leave it like 
it is.

>>     - text alignment
>>          * Paragraph direction is used as a term but the concept is not
>>     explained. start side of the cue box
>>
>>
>> There is a reference to [BIDI 
>> <http://dev.w3.org/html5/webvtt/#bib-BIDI>] . That's where it is defined.
>>
Thanks for the hint.

>>     Regions
>>        * The visual representation would help to get the difference between
>>     region anchor point and region viewport anchor point.
>>
>>
>> OK.  Bug registered: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28075
>>
Great! Thanks for adding the diagram! This helps a lot!

>>     Syntax
>>        * In 4.1 a cue setting list, cue setting name and cue setting value
>>     seem not to be constrained. But they are further constrained to values
>>     by "4.5 WebVTT cue setting". A reference may be helpful.
>>
>>
>> They are deliberately not constrained in 4.1, because the text there 
>> just introduces the concepts. This allows us to introduce more cue 
>> settings at a later stage.
>>

Understood!


>> Thanks for all the feedback. I'd like to encourage you to register 
>> more bugs where you would like to see improvements to the 
>> specification text. It's the best way to keep track.
>>

Will do so!

Thanks to you and Philip for changes and discussion,

Andreas
Received on Sunday, 22 March 2015 17:41:43 UTC