[webvtt] Wide Review Comment 2017: serialisation and parsing from Nigel Megitt via GitHub on 2017-09-27 (public-texttracks@w3.org from September 2017)

From: Nigel Megitt via GitHub <sysbot+gh@w3.org>
Date: Wed, 27 Sep 2017 11:00:37 +0000
To: public-texttracks@w3.org
Message-ID: <issues.opened-260932771-1506510024-sysbot+gh@w3.org>

nigelmegitt has just created a new issue for https://github.com/w3c/webvtt:

== Wide Review Comment 2017: serialisation and parsing ==
Copy/paste from https://lists.w3.org/Archives/Public/public-tt/2017Sep/0080.html - raising as an issue for tracking/disposition purposes.

The WebVTT syntax is similar to (but incompatible with) SRT but otherwise distinct from all other syntaxes, and includes a subsection that is effectively CSS syntax. I consider the serialisation and parsing of a document format to be an architectural layer in its own right, ideally with tests, tools and support for the format. In the case of WebVTT the fact that it has a unique format means that the benefits of referencing an independent serialisation and parsing layer are absent. For internal business to business transactions this creates some hurdles: it is costlier to develop a syntax checker for example to validate that received files are well formed, or to quality check the content; writing custom parser code becomes a security risk since issues like buffer overflow are more commonly, though not uniquely, found in less mature code. The tool support for e.g. JSON, HTML or XML serialisation is much more mature and less likely to suffer from these problems.

It is unclear what action could resolve this with WebVTT in its current form, without taking seemingly extreme steps. For example if WebVTT were a semantic model plus an API, and alternative representations were defined, and at least one of those alternative representations were a more commonly used one, that would help, though at the expense of adding an initial step for every WebVTT import or export, which is to work out which representation to use.

>From this perspective, the syntax of WebVTT seems better suited to direct writing and editing in text editors by humans than by software, though obviously it is ultimately feasible to use either. For an organisation like the BBC authoring and distributing subtitle documents at scale it would be better to optimise for machine reading and writing instead of human reading and writing, since we expect subtitle authors and editors to use specialist software rather than tweaking files directly.


Please view or discuss this issue at https://github.com/w3c/webvtt/issues/367 using your GitHub account

Received on Wednesday, 27 September 2017 11:00:29 UTC