Re: timing model of the media resource in HTML5

On Nov 29, 2009, at 4:11 AM, Philip Jägenstedt wrote:

> On Sat, 28 Nov 2009 19:42:15 +0100, Maciej Stachowiak  
> <mjs@apple.com> wrote:
>>
>> This interface works ok for the specific case of popping up some  
>> text, but it seems like it would be awkward for anything more  
>> complicated, since there is only a single event and set of  
>> handlers. What I would suggest is the declarative cue range idea  
>> that was suggested on the whatwg list a while back:
>>
>> <video>
>>     <source type="video/mp4" src="video.m4v">
>>     <timerange start="10" end="12" onrangeenter="enterRange1()"  
>> onrangeleave="leaveRange1()">
>> </video>
>>
>> This makes it really easy to have different handlers per cue range  
>> without having to express that difference as a string. It also  
>> makes it simpler to use cue ranges for two orthogonal purposes.
>>
>> addCureRange() could just be a shortcut for adding such a <range>  
>> element:
>>
>> var range = v.addCueRange(10, 12);
>> range.addEventListener("rangeenter", function(e)  
>> { e.target.querySelector('overlay').textContent = "Hello"; }, false);
>> range.addEventListener("rangeleave", function(e)  
>> { e.target.querySelector('overlay').textContent = ''; }, false);
>
> If addCueRange does nothing but insert elements in the DOM then we  
> don't need it at all, simply let script authors write it themselves  
> if they need a shortcut.

It's not uncommon for HTML elements to have shortcuts in their DOM  
interface for things you could do in theory by DOM manipulation, for  
example, consider the HTMLTableElement API. In this case, the  
convenience method lets you collapse at least 4 lines into 1, and lets  
you ignore the fact that the range is represented by a DOM element.  
(It might make sense to make it addTimeRange() to match the element  
name, or rename the element <cuerange>.)

> It has been by working assumption that external SRT file should fire  
> the same events, so not all ranges are represented as an element in  
> the DOM. addCueRange would then be a way to add such not-in-DOM  
> ranges.

1) Why playback of an SRT file need to fire range enter/leave events?  
Is there a use case for this?
     1.a) Does it critically need to be the same events?
     1.b) Even if it's the same events, couldn't we dispatch them on  
whatever element embeds the text track?
2) If this is built in UA behavior, there's no need to have a public  
DOM API to implement it.

>
> In <https://wiki.mozilla.org/Accessibility/Experiment1_feedback> I  
> suggested a MediaTimeRange interface. Remxing that somewhat:
>
> interface MediaTimeRange {
>  attribute double start;
>  attribute double end;
>  //attribute DOMString text;
>  // FIXME: how to represent the content?
> }
>
> interface MediaTimeRangeList {
>  // automatically sorted by increasing time
>  readonly attribute unsigned long length;
>  getter DOMString item(in unsigned long index);
>  void add(in MediaTimeRange range);
>  void remove(in MediaTimeRange range);
>  // these last two look suspiciously similar to appendChild and  
> removeChild
> }
>
> interface HTMLItextlistOverlayWhateverElement : HTMLElement {
>  attribute MediaTimeRangeList ranges;
> }
>
> The problem I was trying to solve is that of representing the time  
> ranges uniformly regardless of their source. External subtitles can  
> be accessed and modified via MediaTimeRangeList. <timerange> gets  
> mapped into a MediaTimeRangeList. A MediaTimeRangeList can be  
> constructed by scripts.

I don't think it's necessary to make an external timed text resource  
look the same as built in <timerange> elements. What's the use case  
for this? Is there any code that will want to treat the two in the  
same way?

>
> However, make note of the FIXME. Because not all external subtitle  
> formats can be represented as plain text, there are basically 3  
> options:
>
> 1. Make the content completely opaque. Makes modification  
> impossible, but the same is true of almost any external resource.
>
> 2. Reduce the content to plain text. Modification would then destroy  
> what extra style information there was.

Is there a use case for getting the text of the currently displayed  
subtitle? If subtitles are implemented client-side, I could see it,  
but as a built-in UA behavior, I'm not sure why you would need this.

>
> 3. Transcode the content to HTML+CSS. Basically, while parsing  
> external SRT, the UA would construct an equivalent HTML DOM as  
> children of <itextlist-overlay-whatever>. This would actually make  
> the MediaTimeRange idea above redundant because the information  
> would already be in the DOM. All in all though, this would be quite  
> strange and not a serious suggestion.

That seems pretty complicateded. Again, what's the use case?

>
> Looking at the above, trying to force time ranges from all sources  
> into a single interface isn't looking good. Perhaps the whole effort  
> is misguided.

I don't think there needs to be a single interface that abstracts  
built-in support for timed text formats, and script-defined time ranges.

> Is there in fact a use case for accessing/modifying the time ranges  
> and contents of external subtitles? For getting callbacks/events  
> when such ranges are entered and left? For styling such content with  
> CSS?

I don't think there is. Maybe for styling with CSS, for formats that  
don't have their own styling, but you could handle that with a single  
pseudo-element, or a style rule for the embedding element.

>
> By throwing away all interaction between external subtitles and the  
> DOM, cross-origin issues become irrelevant. The only use case I  
> think is actually... useful... is styling it with CSS. For now, I  
> will abandon the working assumption about SRT firing events, etc.
>
>> Another possibility is that <timerange> elements have contents  
>> which automatically become visible or hidden depending on whether  
>> content is in the range, so the common use case (make some content  
>> appear during certain time ranges of the video) work without any  
>> script:
>>
>> <video>
>>     <source type="video/mp4" src="video.m4v">
>>     <timerange start="10" end="12">Hello</timerange>
>> </video>
>>
>> The contents could be arbitrary HTML, which would make it very  
>> simple to sync a slideshow to a video, in addition to handling the  
>> captions use case. CSS styling could be used to position the  
>> currently visible <timerange> over the video.
>>
>
> I quite like the declarative syntax in the last example, but think  
> that <timerange> should have a wrapping element which is the same  
> used to reference external time ranges (a.k.a. subtitles). Mostly  
> this is to group them into "tracks".
>
> <video>
>  <source type="video/mp4" src="video.m4v">
>  <itextlist-overlay-whatever lang="zh" src="chinese.srt"></itextlist- 
> overlay-whatever>
>  <itextlist-overlay-whatever lang="en">
>    <timerange start="10" end="12">Hello</timerange>
>  </itextlist-overlay-whatever>
>  <itextlist-overlay-whatever lang="sv">
>    <timerange start="10" end="12">Hej</timerange>
>  </itextlist-overlay-whatever>
> </video>

If the element to embed a timed text file were separate from anything  
related to <timerange>, then we could just call it <timedtext>. That  
would be a nice, semantic name. And perhaps we could even define that  
it could be used outside a <video> or <audio>, in which case it gets  
its own set of controls.

I understand your goal here with language selection. If <timedtext>  
needs to have a lang attribute, then perhaps <timerange> could as  
well, and organizing into tracks can be done by the user. However,  
conventionally, language negotiation is not done at the HTML level.  
Usually it is done via the Accept-Language header (HTTP content  
negotiation) or through a site-specific setting stored as a Cookie or  
via guessing based on IP, all server-side. I would be hesitant to make  
the content negotiation work at the HTML level here, unless there  
really is no workaround for some specific use case.

For purely organizational purposes, you could easily use HTML comments  
or a <div> to group <timerange> elements, if you really care to.

> I suppose that for styling, we would have a CSS pseudo- 
> classe :yourtimeisnow ? A probably default style would then be
>
> timerange { display:none; }
> timerange:yourtimeisnow { display: block; }
>
> If we use some declarative time range syntax, surely the next thing  
> people will want is to be able to use it outside of <video>.

Maybe :current would be a better name.



>
> <video id="v0" src="my-video"></video>
> Subtitles below:
> <div>
>  <timerange start="10" end="12" ref="v0">Hello</timerange>
> </div>
>
> Good idea? When people inevitably ask for this, I think we should  
> tell them to do it with scripts instead.

I think we should have some convenient way to put the <timerange>  
content outside the visual area of the video, for the use case of a  
slide show synchronized to a video of the accompanying presentation.  
In that case you definitely want the <timerange> contents below, not  
overlayed. I think perhaps the default presentation should be to put  
the timerange content *after* the video, since it is easier to CSS  
position into a box than out of it. So you could do

video.myVid > timerange {
      text-align: center;
      position: absolute;
      width: 100%;
      bottom: 15px;
}

This should position the contents of time ranges at 15px from the  
bottom of the video. (CSS not tested but I bet something similar would  
work.)

To make this convenient, we could introduce a presentational boolean  
attribute "inside" on <timerange> which does something like this  
automatically. Or we could make being inside the default and have an  
"outside" attribute.

I can see how the "ref" functionality is more general, though, so it  
seems like a valid alternative.

Regards,
Maciej

Received on Monday, 30 November 2009 14:30:46 UTC