RE: Re : FW: associating video and audio transcripts…

Resending to A11Y Task Force email alias since it is the TF that is discussing how to handle transcripts.

/paulc

Paul Cotton, Microsoft Canada
17 Eleanor Drive, Ottawa, Ontario K2E 6A3
Tel: (425) 705-9596 Fax: (425) 936-7329

From: Ingar Mæhlum Arntzen [mailto:ingar.arntzen@gmail.com]
Sent: Monday, May 18, 2015 7:07 AM
To: public-html-media@w3.org
Cc: Daniel Davis; chaals@yandex-team.ru; public-web-and-tv; public-webtiming@w3.org
Subject: Re : FW: associating video and audio transcripts…


This in reply to https://lists.w3.org/Archives/Public/public-html-media/2015May/0027.html as feedback to the discussion around media transcripts http://www.w3.org/WAI/PF/HTML/wiki/Full_Transcript.



Thank you Daniel Davies for bringing this to the attention of the Multi-device Timing Community Group https://www.w3.org/community/webtiming/  via the monthly webtv report https://lists.w3.org/Archives/Public/public-web-and-tv/2015May/0005.html.



The discussion about audio and video transcripts is relevant for the Multi-device Timing Community Group as it ties into what we think is a larger discussion; How the Web supports composition of timed media sources as well as time-sensitive UI components.


Currently, it seems the video element is supposed to grow into some kind of hub for coordinated playback. The video element (with track elements) is supposed to handle multiple media sources and recognize a series of new media types such as subtitles, chapter information, transcripts or perhaps advertisements, as exemplified by another recent discussion within the Media Task Force https://lists.w3.org/Archives/Public/public-web-and-tv/2015May/0001.html. With these new demands follow also the demand for standardization of new media types and possibly built-in support for corresponding visuals in the video element.


This, in my view, is a limiting approach. The Multi-device Timing Community Group is advocating an alternative model, where the video element is no longer the master time-source of a media presentation, but a slave like any other time-sensitive component. Instead, we are proposing the HTMLTimingObject https://www.w3.org/community/webtiming/htmltimingobject/  as a new, explicit time source for timed media. This approach has a number of significant advantages and crucially supports the flexibility and extensibility that we have come to expect from the Web.


·         Loose coupling - the only requirement for time-coordinated behavior is that components interface (take direction) from a shared timing object.

·         Better timing model (more precise and expressive).

·         No dependencies between components - in particular the video element may stay agnostic with respect to the existence of transcripts (yet they may be navigated together and still give the appearance of being tightly connected).

·         Keeps the video element as simple as possible (require only improved support for timed playback)

·         Easy to make presentations from multiple video clips (in parallell or sequentially)

·         Timed data sources may be managed independently

·         Timed UI components may be defined and loaded independently

·         Custom visualizations are easy to make using the proposed HTMLSequencerObject (i.e. improved TrackElement) https://www.w3.org/community/webtiming/2015/05/12/proposal-htmlsequencerobject/


·         Immediate support for multi-device playback through integration of HTMLTimingObject with Shared Motion.


Further information about this model for timed composition is available at the Multi-device Timing Community group as well as in this paper : https://www.w3.org/community/webtiming/2015/05/08/linearcomposition/



Another (in my view distinct) issue in the transcript discussion is about the expression of semantic relations. E.g., allowing a crawler to deduce which transcript belong to which video etc. I would recommend that the scope for this discussion was broadened as well. Transcripts are not the only thing. How about timed geo-positions, or perhaps timed comments, or timed “likes” ? Personally I’ve recently made an HTML based chess board widget that is synchronized with a chess video. This particular widget works on timed chess moves, timed piece drags, timed square and border highlights, and timed canvas drawings for analysis. The point being that different applications will define different kinds of timed data.


In my view, the general problem with respect to semantic relations is how to describe the structure of a media presentation that is composed from a possibly large number of independent (and possibly custom) timed data sources, as well as UI components. In this perspective, adding new tags or attributes to the video element may not seem like a very attractive solution, at least not in the long run. Maybe there could be something like a media manifest to do this? This manifest could even be a document in its own right - hosted on a web server, and referenced by various components from different web pages. An online media manifest would presumably simplify work for crawlers and also emphasize the online nature multi-device linear media.



In summary, I think improved Web support for temporal composition is a key opportunity for Web media, and I hope the Media Task Force could use this occasion to promote the importance of this topic.





Best regards,



Ingar Arntzen, Chair Multi-device Timing Community Group

Received on Monday, 18 May 2015 20:36:29 UTC