Re: timing model of the media resource in HTML5 from Geoff Freed on 2009-11-30 (public-html-a11y@w3.org from November 2009)

From: Geoff Freed <geoff_freed@wgbh.org>
Date: Mon, 30 Nov 2009 08:04:27 -0500
To: Philip Jägenstedt <philipj@opera.com>, Maciej Stachowiak <mjs@apple.com>
CC: Eric Carlson <eric.carlson@apple.com>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <C7392D0B.7495%geoff_freed@wgbh.org>
A few comments below, marked with GF.

Geoff Freed
WGBH/NCAM


On 11/29/09 7:11 AM, "Philip Jägenstedt" <philipj@opera.com> wrote:

On Sat, 28 Nov 2009 19:42:15 +0100, Maciej Stachowiak <mjs@apple.com>
wrote:

>
> On Nov 25, 2009, at 2:49 PM, Philip Jägenstedt wrote:
>
>> On Wed, 25 Nov 2009 22:49:40 +0100, Eric Carlson <eric.carlson@apple.com
>> > wrote:
>>
>>>
>>> On Nov 25, 2009, at 12:02 PM, Philip Jägenstedt wrote:
>>>
>>>> On Wed, 25 Nov 2009 18:43:39 +0100, Eric Carlson
>>>> <eric.carlson@apple.com> wrote:
>>>>
>>>>>
>>>>>  I think <overlay> should be used for internal subtitle and/or
>>>>> closed caption tracks as well. Further, I think that we will want
>>>>> them to "just work" so a UA should create an <overlay> element if
>>>>> the markup doesn't have one and it finds that a file has internal
>>>>> captions/subtitles:
>>>>>
>>>>>    <video src='my-captioned-movie'> </video>
>>>>>
>>>>
>>>> Yes, that sounds good. One issue is how to style such an implicit
>>>> <overlay>. Should one actually include an <overlay> in the markup and
>>>> somehow indicate that it can/should be used to render in-band
>>>> subtitles from the resource?
>>>>
>>>> <video src="my-captioned-movie">
>>>> <caption style="font-weight:bold" magic-attribute></caption>
>>>> </video>
>>>>
>>>> Not awesome. Perhaps a new CSS pseudo-selector could be used? Other
>>>> ideas?
>>>>
>>>   Actually I was imagining that *all* subtitles and captions, in-band
>>> and external alike, would be rendered into an <overlay>. If the markup
>>> doesn't includes an <overlay> element the UA would actually insert one
>>> into the DOM, as is done now for other missing elements (eg. tbody,
>>> etc). This way the default style could be specified in the user agent
>>> style sheet and the author could override as they wish.
>>>
>>
>> Right, all subtitles/captions should be rendered in <overlay>
>> regardless of origin. If the parser automatically inserts an <overlay>
>> element when there is none, what about the case where there is an
>> <overlay> used to show custom controls? I imagined the need for a magic
>> attribute/CSS selector/something to point out the correct <overlay> in
>> cases like this. Possibly a magic src attribute? In any case, these are
>> small issues that I'm sure could be sorted out if <overlay> is
>> implemented.

GF:  Hasn't this already been dealt with SMIL (including smilText (http://www.w3.org/TR/SMIL3/smil-text.html) ) and DFXP, using <region> and CSS?  Is there no way to simply re-use syntax from these markups?


>>
>>>   Speaking of author overrides, another issue we need to deal with is
>>> authors that wish to handle captions themselves. Posting an event when
>>> a new caption needs to be displayed seems logical enough, but how do
>>> we provide access to the caption data in JavaScript?

GF:  Not sure I understand what you mean by authors wanting to handle captions themselves.  Do you mean authors may want to time/synchronize the captions themselves?  Based on a broadcast model, the answer is yes-- authors will either have timed the captions themselves, or repurposed previously timed captions.  If so, can't we use SMIL and/or DFXP as timing models and threat the whole thing declaratively?

Or are you talking about synchronizing an untimed transcript based on a specified starting point in the timeline?


>>>
>>
>> Here I think we should do something similar to cue ranges as has been
>> discussed before in various places. A new event type would allow us to
>> add some data, e.g.
>>
>> interface CueRangeEvent : Event {
>>  readonly attribute double startTime;
>>  readonly attribute double endTime;
>>  readonly attribute DOMString text;
>> };
>>
>> We would need to bring back addCueRange with some modifications.
>>
>> v.addCueRange(10 /* start */, 12 /* end */, "Hello");
>> v.addEventListener('cuerangeenter', function(e) {
>> e.target.querySelector('overlay').textContent = e.text; }, false);
>> v.addEventListener('cuerangeleave', function(e) {
>> e.target.querySelector('overlay').textContent = ''; }, false);
>>
>> You get the idea even if the above doesn't have the perfect
>> interface/method/event names. Something along these lines should make
>> it possible to handle in-band, external and script-created captions in
>> a quite uniform fashion, as well as provide for whatever use cases the
>> old cue range API had.
>
> This interface works ok for the specific case of popping up some text,
> but it seems like it would be awkward for anything more complicated,
> since there is only a single event and set of handlers. What I would
> suggest is the declarative cue range idea that was suggested on the
> whatwg list a while back:
>
> <video>
>      <source type="video/mp4" src="video.m4v">
>      <timerange start="10" end="12" onrangeenter="enterRange1()"
> onrangeleave="leaveRange1()">
> </video>
>
> This makes it really easy to have different handlers per cue range
> without having to express that difference as a string. It also makes it
> simpler to use cue ranges for two orthogonal purposes.
>
> addCureRange() could just be a shortcut for adding such a <range>
> element:
>
> var range = v.addCueRange(10, 12);
> range.addEventListener("rangeenter", function(e) {
> e.target.querySelector('overlay').textContent = "Hello"; }, false);
> range.addEventListener("rangeleave", function(e) {
> e.target.querySelector('overlay').textContent = ''; }, false);

If addCueRange does nothing but insert elements in the DOM then we don't
need it at all, simply let script authors write it themselves if they need
a shortcut. It has been by working assumption that external SRT file
should fire the same events, so not all ranges are represented as an
element in the DOM. addCueRange would then be a way to add such not-in-DOM
ranges.

In <https://wiki.mozilla.org/Accessibility/Experiment1_feedback> I
suggested a MediaTimeRange interface. Remxing that somewhat:

interface MediaTimeRange {
   attribute double start;
   attribute double end;
   //attribute DOMString text;
   // FIXME: how to represent the content?
}

interface MediaTimeRangeList {
   // automatically sorted by increasing time
   readonly attribute unsigned long length;
   getter DOMString item(in unsigned long index);
   void add(in MediaTimeRange range);
   void remove(in MediaTimeRange range);
   // these last two look suspiciously similar to appendChild and
removeChild
}

interface HTMLItextlistOverlayWhateverElement : HTMLElement {
   attribute MediaTimeRangeList ranges;
}

The problem I was trying to solve is that of representing the time ranges
uniformly regardless of their source. External subtitles can be accessed
and modified via MediaTimeRangeList. <timerange> gets mapped into a
MediaTimeRangeList. A MediaTimeRangeList can be constructed by scripts.

However, make note of the FIXME. Because not all external subtitle formats
can be represented as plain text, there are basically 3 options:

1. Make the content completely opaque. Makes modification impossible, but
the same is true of almost any external resource.

2. Reduce the content to plain text. Modification would then destroy what
extra style information there was.

3. Transcode the content to HTML+CSS. Basically, while parsing external
SRT, the UA would construct an equivalent HTML DOM as children of
<itextlist-overlay-whatever>. This would actually make the MediaTimeRange
idea above redundant because the information would already be in the DOM.
All in all though, this would be quite strange and not a serious
suggestion.

Looking at the above, trying to force time ranges from all sources into a
single interface isn't looking good. Perhaps the whole effort is
misguided. Is there in fact a use case for accessing/modifying the time
ranges and contents of external subtitles? For getting callbacks/events
when such ranges are entered and left? For styling such content with CSS?

GF:  I don't believe there is a use case for this.  Authors synchronize captions with the audio for a good reason.  Giving users the ability to alter timing information will only lead to confusion when the captions are no longer in sync with the audio.  As for altering contents:  unless I'm misunderstanding your question, you're talking about actually changing the contents of the *captions*, yes?  If so, that immediately raises copyright issues- in this case, altering captions that belong to someone else-- and you don't want any part of that.   I don't think there's any reason to allow alteration of content.


By throwing away all interaction between external subtitles and the DOM,
cross-origin issues become irrelevant. The only use case I think is
actually... useful... is styling it with CSS. For now, I will abandon the
working assumption about SRT firing events, etc.

> Another possibility is that <timerange> elements have contents which
> automatically become visible or hidden depending on whether content is
> in the range, so the common use case (make some content appear during
> certain time ranges of the video) work without any script:
>
> <video>
>      <source type="video/mp4" src="video.m4v">
>      <timerange start="10" end="12">Hello</timerange>
> </video>
>
> The contents could be arbitrary HTML, which would make it very simple to
> sync a slideshow to a video, in addition to handling the captions use
> case. CSS styling could be used to position the currently visible
> <timerange> over the video.
>

I quite like the declarative syntax in the last example, but think that
<timerange> should have a wrapping element which is the same used to
reference external time ranges (a.k.a. subtitles). Mostly this is to group
them into "tracks".

GF:  I, too, am in favor of a declarative approach.  Again, unless I'm missing an obvious point, why can't timing markup from SMIL be used?  Why invent something new?


<video>
   <source type="video/mp4" src="video.m4v">
   <itextlist-overlay-whatever lang="zh"
src="chinese.srt"></itextlist-overlay-whatever>
   <itextlist-overlay-whatever lang="en">
     <timerange start="10" end="12">Hello</timerange>
   </itextlist-overlay-whatever>
   <itextlist-overlay-whatever lang="sv">
     <timerange start="10" end="12">Hej</timerange>
   </itextlist-overlay-whatever>
</video>

I suppose that for styling, we would have a CSS pseudo-classe
:yourtimeisnow ? A probably default style would then be

timerange { display:none; }
timerange:yourtimeisnow { display: block; }

If we use some declarative time range syntax, surely the next thing people
will want is to be able to use it outside of <video>.

<video id="v0" src="my-video"></video>
Subtitles below:
<div>
   <timerange start="10" end="12" ref="v0">Hello</timerange>
</div>

Good idea? When people inevitably ask for this, I think we should tell
them to do it with scripts instead.

--
Philip Jägenstedt
Core Developer
Opera Software
Received on Monday, 30 November 2009 13:05:31 UTC