[whatwg] Thoughts on video accessibility from Silvia Pfeiffer on 2008-12-09 (public-whatwg-archive@w3.org from December 2008)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 9 Dec 2008 15:56:12 +1100
Message-ID: <2c0e02830812082056j62afcccenaa5e2775da07feef@mail.gmail.com>
On Tue, Dec 9, 2008 at 1:08 PM, Martin Atkins <mart at degeneration.co.uk> wrote:
> Silvia Pfeiffer wrote:
>>
>> Take this as an example:
>>
>> <video src="http://example.com/video.ogv" controls>
>>  <text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text>
>>  <text category="SUB" lang="de" type="application/ttaf+xml"
>> src="german.dfxp"></text>
>>  <text category="SUB" lang="jp" type="application/smil"
>> src="japanese.smil"></text>
>>  <text category="SUB" lang="fr" type="text/x-srt"
>> src="translation_webservice/fr/caption.srt"></text>
>> </video>
>>
>
> Could this combining of resources be achieved instead with SMIL or some
> other existing format?


So, are you suggesting to use something like this:

<video srcdesc="http://example.com/video.smil" controls>
</video>

where the Web client would retrieve the smil file and find all the
references to actual resources inside the SMIL file, then do another
retrieval action to actually retrieve the data it wants?

This is indeed an alternative, which would require to have a smil file
specification that describes the composition of tracks of a single
linear video. It is indeed what we have experimented with in the Ogg
community and have come up with ROE
(http://wiki.xiph.org/index.php/ROE).

<video roe="http://example.com/video.xml" controls>
</video>

When we defined ROE, we were trying to use a tightly defined subpart
of SMIL for it. This however did not work, because some of the
required attributes do not exist in SMIL (e.g. profile, category,
distinction, inline), SMIL was too expressive (e.g. needed to
explicitly separate audio, video, when mediaSource will do fine) and
SMIL required the use of other elements that were really unnecessary.
So, instead of butchering up a sub-version of SMIL that would work
(and look really ugly), we defined a new xml specification that would
satisfy the exact requirements we had.


> If there is already a format for doing this then I think HTML should avoid
> re-inventing it unless HTML's version is better in some way.

I think both have their uses.

We are using the ROE file to describe the (possibly only virtually
existing) media resource on the server. It gives the Web client an
opportunity to request a media resource with only a particular set of
tracks (allows for content adaptation). This results in a single media
file, dynamically created on the Web server, delivered in one
connection, and decoded by the Web browser into its constituent
tracks, which is each displayed in a different, but temporally
synchronised means.

In contrast, the proposed html5 solution requires the Web brower to
set up multiple connections, one each to the resources that it
requires. The decoding and display is then dependent on multiple
connections having delivered enough data to provide for a synchronised
playback. It also allows downloading the full text files first and
display some text ahead of time (as is usual e.g. in a transcript),
while in a multiplexed file the text data is often only retrieved
consecutively in sync with the decoding of the a+v tracks.


>What are the advantages of doing this directly in HTML rather than having the "src" attribute point at some sort of compound media
> document?

I guess, an argument can be made that a user agent could use ROE to
get to the individual streams and download the resources in multiple
connections itself, which would have the exact same effect as the
proposed HTML5 syntax. ROE currently goes beyond just text tracks and
allows description of multiple media and text tracks. You however
wouldn't want a Web browser to have to create multiple connections to
different audio and video resources and have to synchronise them
locally. Text is different in this respect, because it's with almost
certainty a small enough file to be fully received before even the
beginning of a video file has loaded. So, if we used ROE for such a
content selection task, I would courage to only use it for text
tracks.


I'm interested to hear people's opinions on these ideas. I agree with
Ralph and think having a simple, explicit mechanism at the html level
is worthwhile - and very open and explicit to a web author. Having a
redirection through a ROE-type file on the server is more opaque, but
maybe more consistent with existing similar approaches as taken by
RealNetworks in rm files and WindowsMedia files in asx files.

Cheers,
Silvia.
Received on Monday, 8 December 2008 20:56:12 UTC