W3C home > Mailing lists > Public > public-tt@w3.org > August 2009

RE: [minutes] 20090821 Timed Text teleconference

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Tue, 25 Aug 2009 11:48:16 +0100
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: Philippe Le Hegaret <plh@w3.org>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <AB3FC8E280628440B366A29DABB6B6E82F09E5E20F@EA-EXMSG-C334.europe.corp.microsoft.com>
I understand the idea that HTML considers the resource monolitihic, and that it is up to the user agent to gate access to the resources in it, this is fine in some situations; but late binding of such information is an important model too. HTML also defines a case where the author creates the controls, they can't have it both ways, if they want the author to be able to build the UI then they will need a general purpose media API.

I think the idea that <audio> and <video> are somehow more concrete than <ref>, <embed> or whatever is flawed unless a known and limited set of container and essence formats is being defined (which afaik it is not) . <audio> for example would need a width and height to display embedded captions, just as <video> needs volume control for embedded audio. Timed text for example would be perfectly fine as a <video> type, and I think they even talk of SMIL as a <video> type. 

Honestly I believe HTML would be better to just go back to an <object> and <embed> model, and leave all the complexity to the user agent and plugins, however they seem set to try and define a media handling subsystem; which is going to entail complexity. 

I would not necessarily want HTML to start including the time container elements, although there is a historical precedent for that with HTML+TIME. But if they are going to allow alternates due to codec and media attributes and media queries on player capability this could all get very complicated very quickly (at which point a single pointer to a dedicated media setup file starts looks like a good option to me).

In language design I'm generally in favor of a few general tools that work in as many places as possible, and so for options I think something along the lines of a general SVG <switch> statement should be used throughout HTML, but could be special purposed for this context.

In terms of a concrete example I'd see it looking something like:

media:    (video | audio | text | switch)*    # plus possible other types, like animation etc.

Each of the sub elements of media (a nominal par element) is deemed to be a track (in the MPEG sense) played in parallel (when selected), and where one of these can be defined as the syncMaster. The switch element would allow the definition of alternates for each track based on the combination of media queries, codec and mime types and so on of its children elements, the elements in the switch could inherit attributes from the switch to save some duplication.  Script can change the selected attribute (or this could be a CSS attribute) in the DOM to turn on and off the various tracks, and in theory a pseudo DOM could be generated for an MPEG or other container to look much the same from the script point of view.

<media >
	<switch id="mainvideo" selected="on" syncMaster="true" > 
	      <video media="..."  src="foo.big.vid"... />
	      <video media="..."  src="foo.small.vid"  ... />
	<audio id="mainAudio" src="bar.aud"  selected="on" ... />
	<audio id="audioDescription"  src="bar.sap.aud"  selected="off"... />
	<switch id="captions" selected="off" >
		<text media="..."   src="baz.ttxt" ... />
		<text   media="..."  src="baz.srt" ... />
	<switch id="subtitles" selected="off" >
		<text media="..."   src="english.ttxt" ... />
		<text media="..."   src="english.srt" ... />
		<text media="..."   src="french.ttxt" ... />
		<text media="..."   src="french.srt" ... />

As you can see though, this is likely to get out of hand, but each of these features is critical to someone; HTML is going to have a hard time coming to consensus on a simple subset. I expect to re-raise this discussion in the pf/html accessibility task force, but I'm happy to continue the debate here too.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: 25 August 2009 12:49 AM
To: Sean Hayes
Cc: Philippe Le Hegaret; public-tt@w3.org
Subject: Re: [minutes] 20090821 Timed Text teleconference

On Tue, Aug 25, 2009 at 8:22 AM, Sean Hayes<Sean.Hayes@microsoft.com> wrote:
> Yes I've looked at this, we did discuss it briefly; although the notes don't reflect it.
> I think it's a reasonable basic approach, but I'd like to see it be a little more general, allowing for the possibility of audio description tracks using the <audio> tag for <video> and even sign translation tracks, also I don't think there is a need to invent much, SMIL <audio> and <video> are in fact synonyms of the more general media <ref>, as are <text> and <animation> and others. In my opinion HTML could adopt this same notion. I don't think HTML should be looking at importing a lot of SMIL (although I guess they could reference a SMIL file with a media tag), but by adopting the basic media module <ref> and its synonyms, and allowing the sync* attributes, this could all be achieved.

Hi Sean,

As it stands, the HTML5 video element expects audio description tracks and sign translation tracks to come through the binary resource as additional audio or video tracks of the video file. Thus, they cannot be created through HTML markup. However, I think what is missing is the notion that a Web browser should parse such tracks and add them to a menu is created for the video, so users can activate them.

HTML does not like dealing with abstract resources (this was what the <embed> and <object> elements did), but rather prefers dealing with concrete data for which specific attributes and API calls can be made available. For example, width and height attributes don't make much sense on an audio element, while they are very important for video.
Thus, I don't think a general <ref> element makes much sense, since it would be a step back towards what <embed> and <object> used to be.

Also, I am curious about the sync attributes you are referring to. I found endsync http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-timing.html#adef-endsync
, which requires par and sec elements and thus creates a whole media composition language. Is this what you are after? Can you give an example of what you would like to see available in HTML5 in that respect?

Received on Tuesday, 25 August 2009 10:50:10 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 5 October 2017 18:24:04 UTC