- From: Ian Hickson <ian@hixie.ch>
- Date: Sat, 27 Dec 2008 09:16:02 +0000 (UTC)
I have carefully read all the feedback in this thread concerning associating text with video, for various purposes such as captions, annotations, etc. Taking a step back as far as I can tell there are two axes: where the timed text comes from, and how it is rendered. Where it comes from, it seems, boils down to three options: - embedded in or referenced from the media resource itself - as a separate file parsed by the user agent - as a separate file parsed by the web page Where the timed text is rendered boils down to two options: - rendered automatically by the user agent - rendered by the web page overlaying content on the video For the purposes of this discussion I am ignoring burned-in captions, since they're basically equivalent to a different video, much like videos with overlayed sign language interpreters (or VH1 pop-up's annotations!). These 5 options give us 6 cases: 1. Timed text in the resource itself (or linked from the resource itself), rendered as part of the video automatically by the user agent. This is the optimal situation from an accessibility and usability point of view, because it works when the video is shown full-screen, it works when the video is saved separate from the Web page, it works easily when other pages link to the same video file, it requires minimal work from the page author, and so forth. This is what I think we should be encouraging. It would probably make sense to expose the timed text track selection to the Web page through the API, maybe even expose the text itself somehow, but these are features that can and should probably wait until <video> has been more reliably implemented. 2. Timed text in the resource itself (or linked from the resource itself), exposed to the Web page with no native rendering. This allows pages to implement experimental subtitling mechanisms while still allowing the timed text tracks to survive re-use of the video file, but it seems to introduce a high cost (all pages have to implement subtitling themselves) with very little gain, and with several disadvantages -- different sites will have inconsistent subtitling, bugs will be prevalent in the subtitling and accessibility will thus suffer, and in all likelihood even videos that have subtitles will end up not having them shown as small sites sites don't bother to implement anything but the most basic controls. 3. Timed text stored in a separate file, which is then parsed by the user agent and rendered as part of the video automatically by the browser. This would make authoring subtitles somewhat easier, but would typically lose the benefits of subtitles surviving when the video file is extracted. It would also involve a distinct increase in implementation and language complexity. We would also have to pick a timed text format, or add yet another format war to the <video>/<audio> codec debacle, which I think would be a really big mistake right now. Given the immature state of timed text formats (it seems there are new formats announced every month), it's probably premature to pick one -- we should let the market pick one first. 4. Timed text stored in a separate file, which is then parsed by the user agent and exposed to the Web page with no native rendering. This combines the disadvantages of the previous two options, without really introducing any groundbreaking advantages. 5. Timed text stored in a separate file, which is then fetched and parsed by the Web page, which then passes it to the browser for rendering. This is just an excessive level of complexity for a feature that could just be supported exclusively by the user agent. In particular, it doesn't actually provide for much space for experimentation -- whatever API we provide to expose the subtitles would limit what the rendering would be like regardless of what the pages want to try. This option side-steps the issue of picking a format, though. 6. Timed text stored in a separate file, which is then fetched and parsed by the Web page, and which is then rendered by the Web page. We can't stop this from being available, and there's not much we can do to help with this case beyond what we do now. The disadvantages are that it doesn't work when the video is shown full-screen, when the video is saved separate from the Web page, when other pages link to the same video file without using their own implementation of the feature, and it requires substantial implementation work from the page. The _advantages_, and they are significant, are that pages can easily create subtitles separate from the video, they can easily provide features such as automated translations, and they can easily implement features that would otherwise seem overly ambitious, e.g. hyperlinked annotations with ad tracking. Based on this analysis it seems to me that cases 1 and 6 are important to support, but that cases 2 to 5 aren't as compelling -- they either have disadvantages that aren't outweighed by their advantages, or they are just being less powerful than other options. Cases 1 and 6 right now don't require changes to the spec. I think we should eventually provide the APIs mentioned above under case 1 since they would help bridge the gap between the two types of timed text solutions, but as noted above I think we should wait until implementations are more mature before extending the API further. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 27 December 2008 01:16:02 UTC