- From: Andreas Tai <tai@irt.de>
- Date: Tue, 17 Feb 2015 17:39:07 +0100
- To: public-timed-text@w3.org
- CC: David Singer <singer@apple.com>
Dear all, I am following the WebVTT spec for quite some time and wanted to respond to the general call for public review. My comments are observations and I hope they can be helpful for the WebVTT editor and the specification group. They are not thought as change requests. It is in the hand of the editor and /or the specification group to decide if any changes are needed or possible. I post this on the TTML mailing list and on the text track community group list as subscribers may not have been merged yet. ---------------------------------------------------------------------------- One concept of the WebVTT spec is to cleanly separate the following areas: - data model - syntax - parsing - rendering - API GENERAL OBSERVATIONS It is an interesting approach to provide different sections to different target groups (e.g. WebVTT authors and WebVTT parser implementers) so they do not have to read the complete spec. My experience is (after reading different versions of WebVTT) that even for a specific task it is difficult to get the necessary information without reading through the complete spec. If you are an author of WebVTT who wants to get the normative (!) text how to write a timed subtitle in WebVTT that should appear at a specific position at the bottom of the screen, where the text should have a specific font size in relation to the video height and the text color of the first line should be white and the text colour of the second line shall be yellow than it is not sufficient to just read through the data model and syntax sections. You have to read the rendering section which also refers back to concepts of the parsing section. If for the above task you want to get the information about a specific presentation feature like positioning or writing direction you have to extract from every section the different information. Often a part of a section stand is dependent of other parts. You have to know the general concepts that are outlined in a section (e.g. the concept of WebVTT nodes in the parsing section). Also you presentation features depend on each other (e.g. writing direction and positioning). To re-assure you that you have authored a WebVTT file that will be processed exactly as you want (based on normative text) and also if you want to write a WebVTT compliant parser you most probably have to read the complete spec. In that case it would be a big help if the information about a "feature" like positioning is all in one place (how this is represented in the data model, how the syntax looks like, how it is parsed and at the rendered). It is clear that how the spec is written already a re-organisation of the spec text is difficult (e.g. the parsing section is one continuous algorithm). But an additional normative section that brings all together may be a useful guidance. In general the informative text that was added in the later stage editing process of the documents helps a lot. Sometimes I think this is actually necessary normative text (e.g. the notes on positioning in section 4.5). Graphical representations would help a lot to understand the abstract concepts. One example case is positioning. The terms line position and text position have been difficult for me to relate to the concepts they represent. Pictures that visualize cue boxes the writing directions and the positioning concepts would be great. Although the syntax seems consistent to me it would be a great help in addition to the prose there is a formal representation e.g. in Extended Backus–Naur Form (EBNF). I remember that this was already proposed in the community group. In the following I list some observations to specific sections which sometimes highlight as well more general issues. ----------------------- Data Model ----------------------- - Some concepts from the HTML spec are so vital to WebVTT that a short summary would be of help (most importantly text track and text track cue) - The title of 3.1 should possibly be "WebVTT cues" instead of Text track cues. The first sentence in 3.1 indicate that WebVTT cues are instances of text track cues and all the following applies to WebVTT cues. - In the section the term "text track cue" is used which actually should be WebVTT cue? - writing direction * Maybe it would help if it is made more concrete how "vertical" and "horizontal" relate to the rendering pane of the video? * Statements are made that apply to concepts that have not been explained until this part of the spec. So it is explained how the percentage of line position depends on the writing direction but the concept of line position is explained further down. This also true for other parts in the text. On the one hand linear reading is necessary because all normative statements that have been made apply for all what follows. On the other hand you have to read non-linear and jump between parts of the spec. You may argue that this is the new form of reading but the mixture of the concepts makes it difficult to get the complete picture. In this case the statements on line position would fit better in the paragraph about "line position". - snap-to-lines flag * The snap-to-lines flag is of type Boolean that can be "set" to "true" or "false". Instead of referring to these values sometimes the setting of the values is described with the verbs "set" (for setting to "true") and "unset" (for setting to "false"). It would be more consistent and help the reader if the operation is always described as "set to true" and "set to false". - line position * The line position is actually more referring to the concept of a cue box than to the concept of a line. The first sentence states "The line position defines the position of the cue box.". It would be could to have a term that describe this "feature" as an property of the cue box instead of lines. As the syntax maybe hard to change it could help if a synonymous word could be found. Furthermore a relationship to text position could be used in addition. While text position "defines the positioning of the cue box in the direction defined by the writing direction", the line positon defines the position of the cue box orthogonal to the direction defined by the writing direction (?). - text position * There should be a link to the region part when the region is mentioned. * The text position depends on the text alignment (which is explained further down). The next model element after text position is text position alignment. If you read linear through the document you easily confuse the two and refer the dependency to text postion alignment instead to text alignment. * It would be clearer if it is stated explicitly that steps 2 to 4 apply when text position is not set explicitly. - text alignment * Paragraph direction is used as a term but the concept is not explained. start side of the cue box Regions * The visual representation would help to get the difference between region anchor point and region viewport anchor point. Syntax * In 4.1 a cue setting list, cue setting name and cue setting value seem not to be constrained. But they are further constrained to values by "4.5 WebVTT cue setting". A reference may be helpful. Best regards, Andreas -- ------------------------------------------------ Andreas Tai Production Systems Television IRT - Institut fuer Rundfunktechnik GmbH R&D Institute of ARD, ZDF, DRadio, ORF and SRG/SSR Floriansmuehlstrasse 60, D-80939 Munich, Germany Phone: +49 89 32399-389 | Fax: +49 89 32399-200 http: www.irt.de | Email: tai@irt.de ------------------------------------------------ registration court& managing director: Munich Commercial, RegNo. B 5191 Dr. Klaus Illgner-Fehns ------------------------------------------------
Received on Tuesday, 17 February 2015 16:40:37 UTC