- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Fri, 3 Jun 2011 18:21:10 +1000
- To: www-archive@w3.org
Seems this mail was not archived at http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/ Thus forwarding it for archiving. Regards, Silvia. On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson <ian@hixie.ch> wrote: > > (Note that I have tried to only reply to each suggestion once, so > subsequent requests for the same feature are not included below.) > > (I apologise for the somewhat disorganised state of this e-mail. I > normally try to group topics together, but the threads I'm responding to > here jumped back and forth across different issues quite haphazardly and > trying to put related things together broke some of the flow and context > of the discussions, so I opted in several places to leave the context as > it was originally presented, and just jump back and forth amongst the > topics raised. Hopefully it's not too confusing.) > > On Thu, 9 Dec 2010, Silvia Pfeiffer wrote: >> >> > > >> >> > > Sure, but this is only a snippet of an actual application. If, >> >> > > e.g., you want to step through a list of videos (maybe an >> >> > > automated playlist) using script and you need to provide at least >> >> > > two different formats with <source>, you'd want to run this >> >> > > algorithm frequently. >> >> > >> >> > Just have a bunch of <video>s in the markup, and when one ends, >> >> > hide it and show the next one. Don't start dynamically manipulating >> >> > <source> elements, that's just asking for pain. >> >> > >> >> > If you really must do it all using script, just use canPlayType and >> >> > the <video src=""> attribute, don't mess around with <source>. >> >> >> >> Thanks for adding that advice. I think it's important to point that >> >> out. >> > >> > I can add it to the spec too if you think that would help. Where would >> > a good place for it be? >> >> There is a note in the <source> element section that reads as follows: >> "Dynamically modifying a source element and its attribute when the >> element is already inserted in a video or audio element will have no >> effect. To change what is playing, either just use the src attribute on >> the media element directly, or call the load() method on the media >> element after manipulating the source elements." >> >> Maybe you can add some advice there to use canPlayType to identify what >> type of resource to add in the @src attribute on the media element. >> Also, you should remove the last half of the second sentence in this >> note if that is not something we'd like to encourage. > > Done. > > > On Wed, 8 Dec 2010, Kevin Marks wrote: >> >> One case where posters come back after playback is complete is when >> there are multiple videos on the page, and only one has playback focus >> at a time, such as a page of preview movies for longer ones to purchase. >> >> In that case, showing the poster again on blur makes sense conceptually. >> >> It seems that getting back into the pre-playback state, showing the >> poster again would make sense in this context. >> >> That would imply adding an unload() method that reverted to that state, >> and could be used to make any cached media data purgeable in favour of >> another video that is subsequently loaded. > > You don't need unload(), you can just use load(). It essentially resets > the media element. > > It's not hugely efficient, but if we find people are trying to do this a > lot, then we can add a more efficent variant that just resets the poster > frame state, I guess. (I'd probably call it stop(), though, not unload().) > > > On Thu, 9 Dec 2010, David Singer wrote: >> >> I think if you want that effect, you flip what's visible in an area of >> the page between a playing video, and an image. Relying on the poster >> is not effective, IMHO. > > I don't know, I think it would make semantic sense to have all the videos > be <video> elements if they're actually going to be played right there. > > > On Thu, 9 Dec 2010, Kevin Marks wrote: >> >> I know it's not effective at the moment; it is a common use case. >> QuickTime had the 'badge' ux for years that hardly anyone took advantage >> of: >> >> http://www.mactech.com/articles/mactech/Vol.16/16.02/Feb00QTToolkit/index.html >> >> What we're seeing on the web is a converged implementation of the >> YouTube-like overlaid grey play button, but this is effectively >> reimplemented independently by each video site that enables embedding. >> >> As we see HTML used declaratively for long-form works like ebooks on >> lower performance devices, having embedded video that doesn't >> cumulatively absorb all the memory available is going to be like the old >> CD-ROM use cases the QT Badge was meant for. > > This seems like a presentational issue, for which CSS would be better > positioned to provide a solution. > > > On Thu, 9 Dec 2010, Boris Zbarsky wrote: >> On 12/8/10 8:19 PM, Ian Hickson wrote: >> > Boris wrote: >> > > You can't sniff in a toplevel browser window. Not the same way that >> > > people are sniffing in <video>. It would break the web. >> > >> > How so? >> >> People actually rely on the not-sniffing behavior of UAs in actual >> browser windows in some cases. For example, application/octet-stream at >> toplevel is somewhat commonly used to force downloads without a >> corresponding Content-Disposition header (poor practice, but support for >> Content-Disposition hasn't been historically great either). >> >> > (Note that the spec as it stands takes a compromise position: the >> > content is only accepted if the Content-Type and type="" values are >> > supported types (if present) and the content sniffs as a supported >> > type, but nothing in the spec checks that all three values are the >> > same.) >> >> Ah, I see. So similar to the way <img> is handled... >> >> I can't quite decide whether this is the best of both worlds, or the >> worst. ;) > > Yeah, I hear ya. > > >> It certainly makes it simpler to implement video by delegating to >> QuickTime or the like, though I suspect such an implementation would >> also end up sniffing types the UA doesn't necessarily claim to >> support.... so maybe it's not simpler after all. > > Indeed. > > At this point I'm basically just waiting to see what implementations end > up doing. When I tried moving us more towards sniffing, there was > pushback; when I tried moving us more towards honouring types, there was > equal and opposite pushback. So at this point, I'm letting the market > decide it. :-) > > > On Thu, 9 Dec 2010, Simon Pieters wrote: >> On Thu, 09 Dec 2010 02:58:12 +0100, Ian Hickson <ian@hixie.ch> wrote: >> > On Wed, 1 Sep 2010, Simon Pieters wrote: >> > > >> > > I think it might be good to run the media element load algorithm >> > > when setting or changing src on <source> (that has a media element >> > > as its parent), but not type and media (what's the use case for type >> > > and media?). However it would fire an 'emptied' event for each >> > > <source> that changed, which is kind of undesirable. Maybe the media >> > > element load algorithm should only be invoked if src is set or >> > > changed on a <source> that has no previous sibling <source> >> > > elements? >> > >> > What's the use case? Just set .src before you insert the element. >> >> The use case under discussion is changing to another video. So the >> element is already inserted and already has src. >> >> Something like: >> >> <video controls autoplay> >> <source src=video1.webm type=video/webm> >> <source src=video1.mp4 type=video/mp4> >> </video> >> <script> >> function loadVideo(src) { >> var video = document.getElementsByTagName('video')[0]; >> sources = video.getElementsByTagName('source'); >> sources[0].src = src + '.webm'; >> sources[1].src = src + '.mp4'; >> } >> </script> >> <input type="button" value="See video 1" onclick="loadVideo('video1')"> >> <input type="button" value="See video 2" onclick="loadVideo('video2')"> >> <input type="button" value="See video 3" onclick="loadVideo('video3')"> > > Well if you _really_ want to do that, just call video.load() at the end of > loadVideo(). But really, you're better off poking around with > canPlayType() and setting video.src directly instead of using <source> > for these dynamic cases. > > > On Thu, 9 Dec 2010, Kevin Carle wrote something more or less like: >> >> function loadVideo(src) { >> var video = document.getElementsByTagName('video')[0]; >> if (video.canPlayType("video/webm") != "") >> video.src = src + '.webm'; >> else >> video.src = src + '.mp4'; >> } > > Yeah. > > And hopefully this will become moot when there's a common video format, > anyway. > > > On Fri, 10 Dec 2010, Simon Pieters wrote: >> >> You'd need to remove the <source> elements to keep the document valid. > > You don't need them in the first place if you're doing things by script, > as far as I can tell. > > >> The author might want to have more than two <source>s, maybe with >> media="", onerror="" etc. Then it becomes simpler to rely on the >> resource selection algorithm. > > It's hard to comment without seeing a concrete use case. > > > On Tue, 14 Dec 2010, Philip J盲genstedt wrote: >> On Wed, 24 Nov 2010 17:11:02 +0100, Eric Winkelman <E.Winkelman@cablelabs.com> >> wrote: >> > >> > I'm investigating how TimedTracks can be used for in-band-data-tracks >> > within MPEG transport streams (used for cable television). >> > >> > In this format, the number and types of in-band-data-tracks can change >> > over time. So, for example, when the programming switches from a >> > football game to a movie, an alternate language track may appear that >> > wasn't there before. Later, when the programming changes again, that >> > language track may be removed. >> > >> > It's not clear to me how these changes are exposed by the proposed >> > Media Element events. >> >> The thinking is that you switch between different streams by setting the >> src="" attribute to point to another stream, in which case you'll get an >> emptied event along with another bunch of events. If you have a single >> source where audio/video/text streams appear and disappear, there's not >> really any way to handle it. > > As specified, there's no way for a media element's in-band text tracks to > change after the 'loadedmetadata' event has fired. > > >> > The "loadedmetadata" event is used to indicate that the TimedTracks >> > are ready, but it appears that it is only fired before playback >> > begins. Is this event fired again whenever a new track is discovered? >> > Is there another event that is intended for this situation? >> > >> > Similarly, is there an event that indicates when a track has been >> > removed? Or is this also handled by the "loadedmetadata" event >> > somehow? >> >> No, the loadedmetadata event is only fired once per resource, it's not >> the event you're looking for. >> >> As for actual solutions, I think that having loadedmetadata fire again >> if the number or type of streams change would make some sense. > > It would be helpful to know more about these cases where there are dynamic > changes to the audio, video, or text tracks. Does this really happen on > the Web? Do we need to handle it? > > > On Thu, 16 Dec 2010, Silvia Pfeiffer wrote: >> >> I do not know how technically the change of stream composition works in >> MPEG, but in Ogg we have to end a current stream and start a new one to >> switch compositions. This has been called "sequential multiplexing" or >> "chaining". In this case, stream setup information is repeated, which >> would probably lead to creating a new steam handler and possibly a new >> firing of "loadedmetadata". I am not sure how chaining is implemented in >> browsers. > > Per spec, chaining isn't currently supported. The closest thing I can find > in the spec to this situation is handling a non-fatal error, which causes > the unexpected content to be ignored. > > > On Fri, 17 Dec 2010, Eric Winkelman wrote: >> >> The short answer for changing stream composition is that there is a >> Program Map Table (PMT) that is repeated every 100 milliseconds and >> describes the content of the stream. Depending on the programming, the >> stream's composition could change entering/exiting every advertisement. > > If this is something that browser vendors want to support, I can specify > how to handle it. Anyone? > > > On Sat, 18 Dec 2010, Robert O'Callahan wrote: >> >> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#dom-media-duration says: >> [...] >> >> What if the duration is not currently known? > > The user agent must determine the duration of the media resource before > playing any part of the media data and before setting readyState to a > value equal to or greater than HAVE_METADATA, even if doing so requires > fetching multiple parts of the resource. > > >> I think in general it will be very difficult for a user-agent to know >> that a stream is unbounded. In Ogg or WebM a stream might not contain an >> explicit duration but still eventually end. Maybe it would make more >> sense for the last sentence to read "If the media resource is not known >> to be bounded, ..." > > Done. > > > On Sat, 18 Dec 2010, Philip J盲genstedt wrote: >> >> Agreed, this is how I've interpreted the spec already. If a server >> replies with 200 OK instead of 206 Partial Content and the duration >> isn't in the header of the resource, then the duration is reported to be >> Infinity. If the resource eventually ends another durationchange event >> is fired and the duration is reported to be the (now known) length of >> the resource. > > That's fine. > > > On Mon, 20 Dec 2010, Robert O'Callahan wrote: >> >> That sounds good to me. We'll probably do that. The spec will need to be >> changed though. > > I changed it as you suggest above. > > > On Fri, 31 Dec 2010, Bruce Lawson wrote: >> > On Fri, 5 Nov 2010, Bruce Lawson wrote: >> > > >> > > http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#sourcing-in-band-timed-tracks >> > > says to create TimedTrack objects etc for in-band tracks which are >> > > then exposed in the API - so captions/subtitles etc that are >> > > contained in the media container file are exposed, as well as those >> > > tracks pointed to by the <track> element. >> > > >> > > But >> > > http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#timed-track-api >> > > implies that the array is only of tracks in the track element: >> > > >> > > "media . tracks . length >> > > >> > > Returns the number of timed tracks associated with the media element >> > > (e.g. from track elements). This is the number of timed tracks in >> > > the media element's list of timed tracks." >> > >> > I don't understand why you interpret this as implying anything about >> > the track element. Are you interpreting "e.g." as "i.e."? >> > >> > > Suggestion: amend to say "Returns the number of timed tracks >> > > associated with the media element (e.g. from track elements and any >> > > in-band track files inside the media container file)" or some such. >> > >> > I'd rather avoid talking about the in-band ones here, in part because >> > I think it's likely to confuse authors at least as much as help them, >> > and in part because the terminology around in-band timed tracks is a >> > little unclear to me and so I'd rather not talk about them in >> > informative text. :-) >> > >> > If you disagree, though, let me know. I can find a way to make it >> > work. >> >> I disagree, but not aggressively vehemently. My confusion was conflating >> "track elements" with the three instances of the phrase "timed tracks" >> in close proximity. >> >> I suggest that "Returns the number of timed tracks associated with the >> media element (i.e. from track elements and any packaged along with the >> media in its container file)" would be clearer and avoid use of the >> confusing phrase "in-band tracks". > > That's still confusing, IMHO. "Packaged" doesn't imply in-band; most > subtitle files are going to be "packaged" with the video even if they're > out of band. > > Also, your 'i.e.' here is wrong. There's at least one other source of > tracks: the ones added by the script. > > The non-normative text is intentionally not overly precise, because if it > was precise it would just be the same as the normative text and wouldn't > be any simpler, defeating its entire purpose. > > > On Mon, 3 Jan 2011, Philip J盲genstedt wrote: >> > >> > + I've added a magic string that is required on the format to make it >> > recognisable in environments with no or unreliable type labeling. >> >> Is there a reason it's "WEBVTT FILE" instead of just "WEBVTT"? "FILE" >> seems redundant and like unnecessary typing to me. > > It seemed more likely that non-WebVTT files would start with a line that > said just "WEBVTT" than a line that said just "WEBVTT FILE". But I guess > "WEBVTT FILE FORMAT" is just as likely and it'll be caught. > > I've changed it to just "WEBVTT"; there may be existing implementations > that only accept "WEBVTT FILE" so for now I recommend that authors still > use the longer header. > > >> > On Wed, 8 Sep 2010, Philip J盲genstedt wrote: >> > > >> > > In the discussion on public-html-a11y <trackgroup> was suggested to >> > > group together mutually exclusive tracks, so that enabling one >> > > automatically disables the others in the same trackgroup. >> > > >> > > I guess it's up to the UA how to enable and disable <track>s now, >> > > but the only option is making them all mutually exclusive (as >> > > existing players do) or a weird kind of context menu where it's >> > > possible to enable and disable tracks completely independently. >> > > Neither options is great, but as a user I would almost certainly >> > > prefer all tracks being mutually exclusive and requiring scripts to >> > > enable several at once. >> > >> > It's not clear to me what the use case is for having multiple groups >> > of mutually exclusive tracks. >> > >> > The intent of the spec as written was that a browser would by default >> > just have a list of all the subtitle and caption tracks (the latter >> > with suitable icons next to them, e.g. the [CC] icon in US locales), >> > and the user would pick one (or none) from the list. One could easily >> > imagine a UA allowing the user to enable multiple tracks by having the >> > user ctrl-click a menu item, though, or some similar solution, much >> > like with the commonly seen select box UI. >> >> In the vast majority of cases, all tracks are intended to be mutually >> exclusive, such as English+English HoH or subtitles in different >> languages. No media player UI (hardware or software) that I have ever >> used allows enabling multiple tracks at once. Without any kind of hint >> about which tracks make sense to enable together, I can't see desktop >> Opera allowing multiple tracks (of the same kind) to be enabled via the >> main UI. > > Personally I think it's quite reasonable to want to see two languages at > once, or even two forms of the same language at once, especially for, > e.g., reviewing subtitles. But I don't think it would be a bad thing if > some browsers didn't expose that in the UI; that's something that could > be left to bookmarklets, for example. > > >> Using this syntax, I would expect some confusion when you omit the closing >> </v>, when it's *not* a cue spoken by two voices at the same time, such as: >> >> <v Jim>- Boo! >> <v Bob>- Gah! >> >> Gah! is spoken by both Jim and Bob, but that was likely not intended. If >> this causes confusion, we should make validators warn about multiple >> voices with with no closing </v>. > > No need to just warn, the spec says the above is outright invalid, so > they would raise an error. > > >> > > For captions and subtitles it's less common, but rendering it >> > > underneath the video rather than on top of it is not uncommon, e.g. >> > > http://nihseniorhealth.gov/video/promo_qt300.html or >> > >> > Conceptually, that's in the video area, it's just that the video isn't >> > centered vertically. I suppose we could allow UAs to do that pretty >> > easily, if it's commonly desired. >> >> It's already possible to align the video to the top of its content box >> using <http://dev.w3.org/csswg/css3-images/#object-position>: >> >> video { object-position: center top } >> >> (This is already supported in Opera, but prefixed: -o-object-position) > > Sounds good. > > >> Note that in Sweden captioning for the HoH is delivered via the teletext >> system, which would allow ASCII-art to be displayed. Still, I've never >> seen it. The only case of graphics being used in "subtitles" I can >> remember ever seeing is the DVD of >> <http://en.wikipedia.org/wiki/Cat_Soup>, where the subtitle system is >> (ab)used to overlay some graphics. > > Yeah, I'm not at all concerned about not supporting graphics in subtitles. > It's nowhere near the 80% bar. > > >> If we ever want comments, we need to add support in the parser before >> any content accidentally uses the syntax, in other words pretty soon >> now. > > No, we can use any syntax that the parser currently ignores. It won't > break backwards compat with content that already uses it by then, since > the whole point of comments is to be ignored. The only difference is > whether validators complain or not. > > >> > On Tue, 14 Sep 2010, Anne van Kesteren wrote: >> > > >> > > Apart from text/plain I cannot think of a "web" text format that >> > > does not have comments. >> > >> > But what's the use case? Is it really useful to have comments in a >> > subtitle file? >> >> Being able to put licensing/contact information at the top of the file >> would be useful, just as it is in JavaScript/CSS. > > Well the parser explicitly skips over anything in the header block > (everything up to the first blank line IIRC), so if we find that people > want this then we can allow it without having to change any UAs except the > validators. > > >> > On Fri, 22 Oct 2010, Simon Pieters wrote: >> > > > >> > > > It can still be inspired by it though so we don't have to change >> > > > much. I'd be curious to hear what other things you'd clean up >> > > > given the chance. >> > > >> > > WebSRT has a number of quirks to be compatible with SRT, like >> > > supporting both comma and dot as decimal separators, the weird >> > > parsing of timestamps, etc. >> > >> > I've cleaned the timestamp parsing up. I didn't see others. >> >> I consider the cue id line (the line preceding the timing line) to be >> cruft carried over from SRT. When we now both have classes and the >> possibility of getting a cue by index, so why do we need it? > > It's optional, but it is useful, especially for metadata tracks, as a way > to grab specific cues. For example, consider a metadata or chapter track > that contains cues with specific IDs that the site would use to jump to > particular parts of the video in response to key presses, such as "start > of content after intro", or maybe for a podcast with different segments, > where the user can jump to "news" and "reviews" and "final thought" -- you > need an ID to be able to find the right cue quickly. > > >> > > There was also some discussion about metadata. Language is sometimes >> > > necessary for the font engine to pick the right glyph. >> > >> > Could you elaborate on this? My assumption was that we'd just use CSS, >> > which doesn't rely on language for this. >> >> It's not in any spec that I'm aware of, but some browsers (including >> Opera) pick different glyphs depending on the language of the text, >> which really helps when rendering CJK when you have several CJK fonts on >> the system. Browsers will already know the language from <track >> srclang>, so this would be for external players. > > How is this problem solved in SRT players today? > > > On Mon, 14 Feb 2011, Philip J盲genstedt wrote: >> >> Given that most existing subtitle formats don't have any language >> metadata, I'm a bit skeptical. However, if implementors of non-browser >> players want to implement WebVTT and ask for this I won't stand in the >> way (not that I could if I wanted to). For simplicity, I'd prefer the >> language metadata from the file to not have any effect on browsers >> though, even if no language is given on <track>. > > Indeed. > > > On Tue, 4 Jan 2011, Alex Bishop wrote: >> >> Firefox too. If you visit >> http://people.mozilla.org/~jdaggett/webfonts/serbianglyphs.html in >> Firefox 4, the text explicitly marked-up as being Serbian Cyrillic >> (using the lang="sr-Cyrl" attribute) uses some different glyphs to the >> text with no language metadata. > > This seems to be in violation of CSS; we should probably fix it there > before fixing it in WebVTT since WebVTT relis on CSS. > > > On Mon, 3 Jan 2011, Philip J盲genstedt wrote: >> >> > > * The "bad cue" handling is stricter than it should be. After >> > > collecting an id, the next line must be a timestamp line. Otherwise, >> > > we skip everything until a blank line, so in the following the >> > > parser would jump to "bad cue" on line "2" and skip the whole cue. >> > > >> > > 1 >> > > 2 >> > > 00:00:00.000 --> 00:00:01.000 >> > > Bla >> > > >> > > This doesn't match what most existing SRT parsers do, as they simply >> > > look for timing lines and ignore everything else. If we really need >> > > to collect the id instead of ignoring it like everyone else, this >> > > should be more robust, so that a valid timing line always begins a >> > > new cue. Personally, I'd prefer if it is simply ignored and that we >> > > use some form of in-cue markup for styling hooks. >> > >> > The IDs are useful for referencing cues from script, so I haven't >> > removed them. I've also left the parsing as is for when neither the >> > first nor second line is a timing line, since that gives us a lot of >> > headroom for future extensions (we can do anything so long as the >> > second line doesn't start with a timestamp and "-->" and another >> > timestamp). >> >> In the case of feeding future extensions to current parsers, it's way >> better fallback behavior to simply ignore the unrecognized second line >> than to discard the entire cue. The current behavior seems unnecessarily >> strict and makes the parser more complicated than it needs to be. My >> preference is just ignore anything preceding the timing line, but even >> if we must have IDs it can still be made simpler and more robust than >> what is currently spec'ed. > > If we just ignore content until we hit a line that happens to look like a > timing line, then we are much more constrained in what we can do in the > future. For example, we couldn't introduce a "comment block" syntax, since > any comment containing a timing line wouldn't be ignored. On the other > hand if we keep the syntax as it is now, we can introduce a comment block > just by having its first line include a "-->" but not have it match the > timestamp syntax, e.g. by having it be "--> COMMENT" or some such. > > Looking at the parser more closely, I don't really see how doing anything > more complex than skipping the block entirely would be simpler than what > we have now, anyway. > > > On Mon, 3 Jan 2011, Glenn Maynard wrote: >> >> By the way, the WebSRT hit from Google >> (http://www.whatwg.org/specs/web-apps/current-work/websrt.html) is 404. >> I've had to read it out of the Google cache, since I'm not sure where it >> went. > > I added a redirect. > > >> Inline comments (not just line comments) in subtitles are very important >> for collaborative editing: for leaving notes about a translation, noting >> where editing is needed or why a change was made, and so on. >> >> If a DOM-like interface is specified for this (presumably this will >> happen later), being able to access inline comments like DOM comment >> nodes would be very useful for visual editors, to allow displaying >> comments and to support features like "seek to next comment". > > We can add comments pretty easily (e.g. we could say that "<!" starts a > comment and ">" ends it -- that's already being ignored by the current > parser), if people really need them. But are comments really that useful? > Did SRT have problem due to not supporting inline comments? (Or did it > support inline comments?) > > > On Tue, 4 Jan 2011, Glenn Maynard wrote: >> On Tue, Jan 4, 2011 at 4:24 AM, Philip J盲genstedt <philipj@opera.com> >> wrote: >> > If you need an intermediary format while editing, you can just use any >> > syntax you like and have the editor treat it specially. >> >> If I'd need to write my own parser to write an editor for it, that's one >> thing--but I hope I wouldn't need to create yet another ad hoc caption >> format, mirroring the features of this one, just to work around a lack >> of inline comments. > > An editor would need a custom parser anyway to make sure it round-tripped > syntax errors, presumably. > > >> The cue text already vaguely resembles HTML. What about <!-- comments >> -->? It's universally understood, and doesn't require any new escape >> mechanisms. > > The current parser would end a comment at the first ">", but so long as > you didn't have a ">" in the comment, "<!--...-->" would work fine within > cue text. (We would have to be careful in standalone blocks to define it > in such a way that it could not be confused with a timing line.) > > > On Wed, 5 Jan 2011, Philip J盲genstedt wrote: >> >> The question is rather if the comments should be exposed as DOM comment >> nodes in getCueAsHTML, which seems to be what you're asking for. That >> would only be possible if comments were only allowed inside the cue >> text, which means that you couldn't comment out entire cues, as such: >> >> 00:00.000 --> 00:01.000 >> one >> >> /* >> 00:02.000 --> 00:03.000 >> two >> */ >> >> 00:04.000 --> 00:05.000 >> three >> >> Therefore, my thinking is that comments should be removed during parsing >> and not be exposed to any layer above it. > > We can support both, if there's really demand for it. > > For example: > > 00:00.000 --> 00:01.000 > one <! inline comment > one > > COMMENT--> > 00:02.000 --> 00:03.000 > two; this is entirely > commented out > > <! this is the ID line > 00:04.000 --> 00:05.000 > three; last line is a ">" > which is part of the cue > and is not a comment. > > > > The above would work today in a conforming UA. The question really is what > parts of this do we want to support and what do we not care enough about. > > > On Wed, 5 Jan 2011, Anne van Kesteren wrote: >> On Wed, 05 Jan 2011 10:58:56 +0100, Philip J盲genstedt >> <philipj@opera.com> wrote: >> > Therefore, my thinking is that comments should be removed during >> > parsing and not be exposed to any layer above it. >> >> CSS does that too. It has not caused problems so far. It does mean >> editing tools need a slightly different DOM, but that is always the case >> as they want to preserve whitespace details, etc., too. At least editors >> that have both a text and visual interface. > > Right. > > > On Fri, 14 Jan 2011, Silvia Pfeiffer wrote: >> >> We are concerned, however, about the introduction of WebVTT as a >> universal captioning format *when used outside browsers*. Since a subset >> of CSS features is required to bring HTML5 video captions on par with TV >> captions, non-browser applications will need to support these CSS >> features, too. However, we do not believe that external CSS files are an >> acceptable solution for non-browser captioning and therefore think that >> those CSS features (see [1]) should eventually be made part of the >> WebVTT specification. >> >> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#the-'::cue'-pseudo-element > > I'm not sure what you mean by "made part of the WebVTT specification", but > if you mean that WebVTT should support inline CSS, that does seem line > something we can add, e.g. using syntax like this: > > WEBVTT > > STYLE--> > ::cue(v[voice=Bob]) { color: green; } > ::cue(c.narration) { font-style: italic; } > ::cue(c.narration i) { font-style: normal; } > > 00:00.000 --> 00:02.000 > Welcome. > > 00:02.500 --> 00:05.000 > To WebVTT. > > I suggest we wait until WebVTT and '::cue' in particular have shipped in > at least one browser and been demonstrated as being useful before adding > this kind of feature though. > > >> 1. Introduce file-wide metadata >> >> WebVTT requires a structure to add header-style metadata. We are here >> talking about lists of name-value pairs as typically in use for header >> information. The metadata can be optional, but we need a defined means >> of adding them. >> >> Required attributes in WebVTT files should be the main language in use >> and the kind of data found in the WebVTT file - information that is >> currently provided in the <track> element by the @srclang and @kind >> attributes. These are necessary to allow the files to be interpreted >> correctly by non-browser applications, for transcoding or to determine >> if a file was created as a caption file or something else, in particular >> the @kind=metadata. @srclang also sets the base directionality for BiDi >> calculations. >> >> Further metadata fields that are typically used by authors to keep >> specific authoring information or usage hints are necessary, too. As >> examples of current use see the format of MPlayer mpsub’s header >> metadata [2], EBU STL’s General Subtitle Information block [3], and >> even CEA-608’s Extended Data Service with its StartDate, Station, >> Program, Category and TVRating information [4]. Rather than specifying a >> specific subset of potential fields we recommend to just have the means >> to provide name-value pairs and leave it to the negotiation between the >> author and the publisher which fields they expect of each other. >> >> [2] http://www.mplayerhq.hu/DOCS/tech/mpsub.sub >> [3] https://docs.google.com/viewer?a=v&q=cache:UKnzJubrIh8J:tech.ebu.ch/docs/tech/tech3264.pdf >> [4] http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/47cfr15.119.pdf > > I don't understand the use cases here. > > CSS and JS don't have anything like this, why should WebVTT? What problem > is this solving? How did SRT solve this problem? > > >> 2. Introduce file-wide cue settings >> >> At the moment if authors want to change the default display of cues, >> they can only set them per cue (with the D:, S:, L:, A: and T:. cue >> settings) or have to use an external CSS file through a HTML page with >> the ::cue pseudo-element. In particular when considering that all >> Asian language files would require a “D:vertical” marker, it becomes >> obvious that this replication of information in every cue is >> inefficient and a waste of bandwidth, storage, and application speed. >> A cue setting default section should be introduced into a file >> header/setup area of WebVTT which will avoid such replication. >> >> An example document with cue setting defaults in the header could look >> as follows: >> == >> WEBVTT >> Language=zh >> Kind=Caption >> CueSettings= A:end D:vertical >> >> 00:00:15.000 --> 00:00:17.950 >> 在左边我们可以看到... >> >> 00:00:18.160 --> 00:00:20.080 >> 在右边我们可以看到... >> >> 00:00:20.110 --> 00:00:21.960 >> ...捕蝇草械. >> == >> >> Note that you might consider that the solution to this problem is to use >> external CSS to specify a change to all cues. However, this is not >> acceptable for non-browser applications and therefore not an acceptable >> solution to this problem. > > Adding defaults seems like a reasonable feature. We could add this just by > adding the ability to have a block in a VTT file like this: > > WEBVTT > > DEFAULTS --> A:vertical A:end > > 00:00.000 --> 00:02.000 > This is vertical and end-aligned. > > 00:02.500 --> 00:05.000 > As is this. > > DEFAULTS --> A:start > > 00:05.500 --> 00:07.000 > This is horizontal and start-aligned. > > However, again I suggest that we wait until WebVTT has been deployed in at > least one browser before adding more features like this. > > >> * positioning: Generally the way in which we need positioning to work is >> to provide an anchor position for the text and then explain in which >> direction font size changes and the addition of more text allows the >> text segment to grow. It seems that the line position cue (L) provides a >> baseline position and the alignment cue (A) provides the growing >> direction start/middle/end. Can we just confirm this understanding? > > It's more the other way around: the line boxes are laid out and then the > resulting line boxes are positioned according to the A: and L: lines. In > particular, the L: lines when given with a % character position the line > boxes in the same manner that CSS background-position positions the > background image, and L: lines without a % character set the position of > the line boxes based on the height of the first line box. A: lines then > just set the position of these line boxes relative to the other dimension. > > >> * fontsize: When changing text size in relation to the video changing >> size or resolution, we need to make sure not to reduce the text size >> below a specific font size for readability reasons. And we also need to >> make sure not to make it larger than a specific font size, since >> otherwise it will dominate the display. We usually want the text to be >> at least Xpx, but no bigger than Ypx. Also, one needs to pay attention >> to the effect that significant player size changes have on relative >> positioning - in particular for the minimum caption text size. Dealing >> with min and max sizes is missing from the current specification in our >> understanding. > > That's a CSS implementation issue. Minimum font sizes are commonly > supported in CSS implementations. Maximum font sizes would be similar. > > >> * bidi text: In our experience from YouTube, we regularly see captions >> that contain mixed languages/directionality, such as Hebrew captions >> that have a word of English in it. How do we allow for bidi text inside >> cues? How do we change directionality mid-cue? Do we deal with the >> zero-width LTR-mark and RTL-mark unicode characters? It would be good to >> explain how these issues are dealt with in WebVTT. > > There's nothing special about how they work in WebVTT; they are handled > the same as in CSS. > > >> * internationalisation: D:vertical and D:vertical-lr seem to only work >> for vertical text - how about horizontal-rl? For example, Hebrew is a >> prime example of a language being written from right to left >> horizontally. Is that supported and how? > > What exactly would horizontal-rl do? > > >> * naming: The usage of single letter abbreviations for cue settings has >> created quite a discussion here at Google. We all agree that file-wide >> cue settings are required and that this will reduce the need for >> cue-specific cue settings. We can thus afford a bit more readability in >> the cue settings. We therefore believe that it would be better if the >> cue settings were short names rather than single letter codes. This >> would be more like CSS, too, and easier to learn and get right. In the >> interface description, the 5 dimensions have proper names which could be >> re-used (“direction”, “linePosition”, “textPosition”, “size” and >> “align"). We therefore recommend replacing the single-letter cue >> commands with these longer names. > > That would massively bloat these files and make editing them a huge pain, > as far as I can tell. I agree that defaults would make it better, but many > cues would still need their own positioning and sizing information, and > anything beyond a very few letters would IMHO quickly become far too > verbose for most people. "L", "A", and "S" are pretty mnemonic, "T" would > quickly become familiar to people writing cues, and "D" is only going to > be relevant to some authors but for those authors it's pretty > self-explanatory as well, since the value is verbose. > > What I really would like to do is use "X" and "Y" instead of "T" and "L", > but those terms would be very confusing when we flip the direction, which > is why I used the less obvious "T" and "L". > > >> * textcolor: In particular on European TV it is common to distinguish >> between speakers by giving their speech different colors. The following >> colors are supported by EBU STL, CEA-608 and CEA-708 and should be >> supported in WebVTT without the use of external CSS: black, red, green, >> yellow, blue, magenta, cyan, and white. As default we recommend white on >> a grey transparent background. > > This is supported as 'color' and 'background'. > > >> * underline: EBU STL, CEA-608 and CEA-708 support underlining of >> characters. > > I've added support for 'text-decoration'. > > >> The underline character is also particularly important for some Asian >> languages. > > Could you elaborate on this? > > >> Please make it possible to provide text underlines without the use of >> CSS in WebVTT. > > Why without CSS? > > >> * blink: As much as we would like to discourage blinking subtitles, they >> are actually a core requirement for EBU STL and CEA-608/708 captions and >> in use in particular for emergency messages and similar highly important >> information. Blinking can be considered optional for implementation, but >> we should allow for it in the standard. > > This is part of 'text-decoration'. > > >> * font face: CEA-708 provides a choice of eight font tags: undefined, >> monospaced serif, proportional serif, monospaced sans serif, >> proportional sans serif, casual, cursive, small capital. These fonts >> should be available for WebVTT as well. Is this the case? > > Yes. > > >> We are not sure about the best solution to these needs. Would it be best >> to introduce specific tags for these needs? > > CSS seems to handle these needs adequately. > > >> We have a couple of recommendations for changes mostly for aesthetic and >> efficiency reasons. We would like to point out that Google is very >> concerned with the dense specification of data and every surplus >> character, in particular if it is repeated a lot and doesn’t fulfill a >> need, should be removed to reduce the load created on worldwide >> networking and storage infrastructures and help render Web pages faster. > > This seems to contradict your earlier request to make the languge more > verbose... > > >> * Time markers: WebVTT time stamps follow no existing standard for time >> markers. Has the use of NPT as introduced by RTSP[5] for time markers >> been considered (in particular npt-hhmmss)? >> >> [5] http://www.ietf.org/rfc/rfc2326.txt > > WebVTT follows the SRT format, with commas replaced by periods for > consistency with the rest of the platform. > > >> * Suggest dropping “-->”: In the context of HTML, “-->” is an end >> comment marker. It may confuse Web developers and parsers if such a sign >> is used as a separator. For example, some translation tools expect HTML >> or XML-based interchange formats and interpret the “>” as part of a >> tag. Also, common caption convention often uses “>” to represent >> speaker identification. Thus it is more difficult to write a filter >> which correctly escapes “-->” but retains “>” for speaker ID. > > "-->" seems pretty mnemonic to me. I don't see why we'd want to drop it. > > >> * Duration specification: WebVTT time stamps are always absolute time >> stamps calculated in relation to the base time of synchronisation with >> the media resource. While this is simple to deal with for machines, it >> is much easier for hand-created captions to deal with relative time >> stamps for cue end times and for the timestamp markers within cues. Cue >> start times should continue to stay absolute time stamps. Timestamp >> markers within cues should be relative to the cue start time. Cue end >> times should be possible to be specified either as absolute or relative >> timestamps. The relative time stamps could be specified through a prefix >> of “+” in front of a “ss.mmm” second and millisecond specification. >> These are not only simpler to read and author, but are also more compact >> and therefore create smaller files. > > I think if anything is absolute, it doesn't really make anything much > simpler for anything else to be relative, to be honest. Take the example > you give here: > >> An example document with relative timestamps is: >> == >> WEBVTT >> Language=en >> Kind=Subtitle >> >> 00:00:15.000 +2.950 >> At the left we can see... >> >> 00:00:18.160 +1.920 >> At the right we can see the... >> >> 00:00:20.110 +1.850 >> ...the <+0.400>head-<+0.800>snarlers >> == > > If the author were to change the first time stamp because the video gained > a 30 second advertisement at the start, then he would still need to change > the hundreds of subseqent timestamps for all the additional cues. What > does the author gain from not having to change the relative stamps? It's > not like he's going to be doing it by hand, and once a tool is involved, > the tool can change everything just as easily. > > >> We are happy to see the introduction of the magic file identifier for >> WebVTT which will make it easier to identify the file format. We do not >> believe the “FILE” part of the string is necessary. > > I have removed it. > > >> However, we recommend to also introduce a format version number that the >> file adheres to, e.g. “WEBVTT 0.7”. > > Version numbers are an antipattern on the Web, so I have not added one. > > >> This helps to make non-browser systems that parse such files become >> aware of format changes. > > The format will never change in a non-backwards-compatible fashion once it > is deployed, so that is not a concern. > > >> It can also help identify proprietary standard metadata sets as used by >> a specific company, such as “WEBVTT 0.7 ABC-meta1” which could signify >> that the file adheres to WEBVTT 0.7 format specification with the >> ABC-meta1 metadata schema. > > If we add metadata, then that can be handled just by having the metadata > include that information itself. > > >> CEA-708 captions support automatic line wrapping in a more sophisticated >> way than WebVTT -- see http://en.wikipedia.org/wiki/CEA-708#Word_wrap. >> >> In our experience with YouTube we have found that in certain situations >> this type of automatic line wrapping is very useful. Captions that were >> authored for display in a full-screen video may contain too many words >> to be displayed fully within the actual video presentation (note that >> mobile / desktop / internet TV devices may each have a different amount >> of space available, and embedded videos may be of arbitrary sizes). >> Furthermore, user-selected fonts or font sizes may be larger than >> expected, especially for viewers who need larger print. >> >> WebVTT as currently specified wraps text at the edge of their containing >> blocks, regardless of the value of the 'white-space' property, even if >> doing so requires splitting a word where there is no line breaking >> opportunity. This will tend to create poor quality captions. For >> languages where it makes sense, line wrapping should only be possible at >> carriage return, space, or hyphen characters, but not on >> characters. (Note that CEA-708 also contains non-breaking space and >> non-breaking transparent space characters to help control wrapping.) >> However, this algorithm will not necessarily work for all languages. >> >> We therefore suggest that a better solution for line wrapping would be >> to use the existing line wrapping algorithms of browsers, which are >> presumably already language-sensitive. >> >> [Note: the YouTube line wrapping algorithm goes even further by >> splitting single caption cues into multiple cues if there is too much >> text to reasonably fit within the area. YouTube then adjusts the times >> of these caption cues so they appear sequentially. Perhaps this could >> be mentioned as another option for server-side tools.] > > I've adjusted the text in the spec to more clearly require that > line-breaking follow normal CSS rules but with the additional requirement > that there not be overflow, which is what I had intended. > > >> 1. Pop-on/paint-on/roll-up support >> >> Three different types of captions are common on TV: pop-on, roll-up and >> paint-on. Captions according to CEA-608/708 need to support captions of >> all three of these types. We believe they are already supported in >> WebVTT, but see a need to re-confirm. >> >> For pop-on captions, a complete caption cue is timed to appear at a >> certain time and disappear a few seconds later. This is the typical way >> in which captions are presented and also how WebVTT/<track> works in our >> understanding. Is this correct? > > As far as I understand, yes. > > >> For roll-up captions, individual lines of captions are presented >> successively with older lines moving up a line to make space for new >> lines underneath. Assuming we understand the WebVTT rendering rules >> correctly, WebVTT would identify each of these lines as an individual, >> but time-overlapping cue with the other cues. As more cues are created >> and overlap in time, newer cues are added below the currently visible >> ones and move the currently visible ones up, basically creating a >> roll-up effect. If this is a correct understanding, then this is an >> acceptable means of supporting roll-up captions. > > I am not aware of anything currently in the WebVTT specification which > will cause a cue to move after it has been placed on the video, so I do > not believe this is a correct understanding. > > However, you can always have a cue be replaced by a cue with the same text > but on a higher line, if you're willing to do some preprocessing on the > subtitle file. It won't be a smoothly animated scroll, but it would work. > > If there is convincing evidence that this kind of subtitle is used on the > Web, though, we can support it more natively. So far I've only seen it in > legacy scenarios that do not really map to expected WebVTT use cases. > > For supporting those legacy scenarios, you need script anyway (to handle, > e.g., backspace and moving the cursor). If you have script, doing > scrolling is possible either by moving the cue, or by not using the > default UA rendering of the cues at all and doing it manually (e.g. using > <div>s or <canvas>). > > >> Finally, for paint-on captions, individual letters or words are >> displayed successively on screen. WebVTT supports this functionality >> with the cue timestamps <xx:xx:xx.xxx>, which allows to specify >> characters or words to appear with a delay from within a cue. This >> essentially realizes paint-on captions. Is this correct? > > Yes. > > >> (Note that we suggest using relative timestamps inside cues to make this >> feature more usable.) > > It makes it modestly easier to do by hand, but hand-authoring a "paint-on" > style caption seems like a world of pain regardless of the timestamp > format we end up using, so I'm not sure it's a good argument for > complicating the syntax with a second timestamp format. > > >> The HTML spec specifies that it is not allowed to have two tracks that >> provide the same kind of data for the same language (potentially empty) >> and for the same label (potentially empty). However, we need >> clarification on what happens if there is a duplicate track, ie: does >> the most recent one win or the first one or will both be made available >> in the UI and JavaScript? > > They are both available. > > >> The spec only states that the combination of {kind, type, label} must be >> unique. It doesn't say what happens if they are not. > > Nothing different happens if they are not than if they are. It's just a > conformance requirement. > > >> Further, the spec says nothing about duplicate labels altogether - what >> is a browser supposed to do when two tracks have been marked with the >> same label? > > That same as it does if they have different labels. > > >> It is very important that there is a possibility for users to >> auto-activate tracks. Which track is chosen as the default track to >> activate depends on the language preferences of the user. The user is >> assumed to have a list of language preferences which leads this choice. > > I've added a "default" attribute so that sites can control this. > > >> In YouTube, if any tracks exist that match the first language >> preference, the first of those is used as the default. A track with >> no name sorts ahead of one with a name. The sorting is done according >> to that language's collation order. In order to override this you >> would need (1) a default=true attribute for a track which gives it >> precedence if its language matches, and (2) a way to force the >> language preference. If no tracks exist for the first language pref, >> the second language pref is checked, and so on. >> >> If the user's language preferences are known, and there are no tracks >> in that language, you have other options: >> (1) offer to do auto-translation (or just do it) >> (2) use a track in the same language that the video's audio is in (if known) >> (3) if only one track, use the first available track >> >> Also make sure the language choice can be overriden by the user >> through interaction. >> >> We’d like to make sure this or a similar algorithm is the recommended >> way in which browsers deal with caption tracks. > > This seems to me to be a user agent quality of implementation issue. User > preferences almost by definition can't be interoperable, so it's not > something we can specify. > > >> As far as we understand, you can currently address all cues through >> ::cue and you can address a cue part through ::cue-part(<voice> || >> <part> || <position> || <future-compatibility>). However, if we >> understand correctly, it doesn’t seem to be possible to address an >> individual cue through CSS, even though cues have individual >> identifiers. This is either an oversight or a misunderstanding on our >> parts. Can you please clarify how it is possible to address an >> individual cue through CSS? > > I've made the ID referencable from the ::cue() selector argument as an ID > on the anonymous root element. > > >> Our experience with automated caption creation and positioning on >> YouTube indicates that it is almost impossible to always place the >> captions out of the way of where a user may be interested to look at. We >> therefore allow users to dynamically move the caption rendering area to >> a different viewport position to reveal what is underneath. We recommend >> such drag-and-drop functionality also be made available for TimedTrack >> captions on the Web, especially when no specific positioning information >> is provided. > > I've added text to explicitly allow this. > > > On Sat, 22 Jan 2011, Philip J盲genstedt wrote: >> >> Indeed, repeating settings on each cue would be annoying. However, >> file-wide settings seems like it would easily be too broad, and you'd >> have to explicitly reverse the effect on the cues where you don't want >> it to apply. Maybe classes of cue settings or some kind of macros would >> work better. > > My assumption is that similar cues will typically be grouped together, so > that one could introduce the group with a "DEFAULTS" block and then > > >> Nitpick: Modern Chinese, including captions, is written left-to-right, >> top-to-bottom, just like English. > > Indeed. I don't expect there will be much vertical text captioning. I > added it primarily to support some esoteric Anime cases. > > > >> That the intra-cue timings are relative but the timing lines are >> absolute has bugged me a bit, so if the distinction was more obvious >> just from the syntax, that'd be great! > > They're all absolute. > > >> [for the file signature] "WebSRT" is prettier than "WEBSRT". > > The idea is not to be pretty, the idea is to stand out. :-) > > >> I'm inclined to say that we should normalize all whitespace during >> parsing and not have explicit line breaks at all. If people really want >> two lines, they should use two cues. In practice, I don't know how well >> that would fare, though. What other solutions are there? > > I think we definitely need line breaks, e.g. for cases like: > > -- Do you want to go to the zoo? > -- Yes! > -- Then put your shoes on! > > ...which is quite common style in some locales. > > However, I agree that we should encourage people to let browsers wrap the > lines. Not sure how to encourage that more. > > > On Sun, 23 Jan 2011, Glenn Maynard wrote: >> >> It should be possible to specify language per-cue, or better, per block >> of text mid-cue. Subtitles making use of multiple languages are common, >> and it should be possible to apply proper font selection and word >> wrapping to all languages in use, not just the primary language. > > It's not clear to me that we need language information to apply proper > font selection and word wrapping, since CSS doesn't do it. > > >> When both English subtitles and Japanese captions are on screen, it >> would be very bad to choose a Chinese font for the Japanese text, and >> worse to choose a Western font and use it for everything, even if >> English is the predominant language in the file. > > Can't you get around this using explicit styles, e.g. against classes? > Unless this really is going to be a common problem, I'm not particularly > concerned about it. > > > On Mon, 24 Jan 2011, Philip J盲genstedt wrote: >> >> Multi-languaged subtitles/captions seem to be extremely uncommon, >> unsurprisingly, since you have to understand all the languages to be >> able to read them. >> >> The case you mention isn't a problem, you just specify Japanese as the >> main language. > > Indeed. > > >> There are a few other theoretical cases: >> >> * Multi-language CJK captions. I've never seen this, but outside of >> captioning, it seems like the foreign script is usually transcribed to >> the native script (e.g. writing Japanese names with simplified Chinese >> characters). >> >> * Use of Japanese or Chinese words in a mostly non-CJK subtitles. This >> would make correct glyph selection impossible, but I've never seen it. >> >> * Voice synthesis of e.g. mixed English/French captions. Given that this >> would only be useful to be people who know both languages, it seem not >> worth complicating the format for. > > Agreed on all fronts. > > >> Do you have any examples of real-world subtitles/captions that would >> benefit from more fine-grained language information? > > This kind of information would indeed be useful. > > > On Mon, 24 Jan 2011, Glenn Maynard wrote: >> >> They're very common in anime fansubs: >> >> http://img339.imageshack.us/img339/2681/screenshotgg.jpg >> >> The text on the left is a transcription, the top is a transliteration, >> and the bottom is a translation. > > Aren't these three separate text tracks? > > >> I'm pretty sure I've also seen cases of translation notes mixing >> languages within the same caption, eg. "jinja (绁炵ぞ): shrine", but >> it's less common and I don't have an example handy. > > Mixing one CJK language with one non-CJK language seems fine. That should > always work, assuming you specify good fonts in the CSS. > > >> > The case you mention isn't a problem, you just specify Japanese as the >> > main language. There are a few other theoretical cases: >> >> Then you're indicating that English text is Japanese, which I'd expect >> to cause UAs to render everything with a Japanese font. That's what >> happens when I load English text in Firefox and force SJIS: everything >> is rendered in MS PGothic. That's probably just what Japanese users >> want for English text mixed in with Japanese text, too--but it's >> generally not what English users want with the reverse. > > I don't understand why we can't have good typography for CJK and non-CJK > together. Surely there are fonts that get both right? > > > On Mon, 24 Jan 2011, Glenn Maynard wrote: >> > >> > [ use multiple tracks ] >> >> Personally I'd prefer that, but it would require a good deal of metadata >> support--marking which tracks are meant to be used together, tagging >> auxilliary track types so browsers can choose (eg. an "English subtitles >> with no song caption tracks" option), and so on. I'm sure that's a >> non-starter (and I'd agree). > > It's not that much metadata. It's far less effort than making the > subtitles in the first place. > > >> I don't think you should need to resort to fine-grained font control to get >> reasonable default fonts. > > I agree entirely, but I don't think you should need to resort to > fine-grained language tagging either... > > >> The above--semantics vs. presentation--brings something else to mind. >> One of the harder things to subtitle well is when you have two >> conversations talking on top of each other. This is generally done by >> choosing a vertical spot for each conversation (generally augmented with >> a color), so the viewer can easily follow one or the other. Setting the >> line position *sort of* lets you do this, but that's hard to get right, >> since you don't know how far apart to put them. You'd have to err >> towards putting them too far apart (guessing the maximum number of lines >> text might be wrapped to, and covering up much more of the screen than >> usually needed), or putting one set on the top of the screen (making it >> completely impossible to read both at once, rather than just >> challenging). >> >> If I remember correctly, SSA files do this with a hack: wherever there's >> a blank spot in one or the other conversation, a transparent dummy cue >> is added to keep the other conversation in the correct relative spot, so >> the two conversations don't swap places. >> >> I mention this because it comes to mind as something well-authored, >> well-rendered subtitles need to get right, and I'm curious if there's a >> reliable way to do this currently with WebVTT. If this isn't handled, >> some scenes just fall apart. > > It's intended to be done using the L: feature to pick the lines. If the > cues have more line wrapping than the author expected, it'll break. The > only way around that would be to go through the whole file (or at least, > the whole scene, somehow marked up as such) pre-rendering each cue to work > out what the maximum line heights would be and then using that offset for > each cue, etc, but that seems like a whole lot of complexity for a minor > use case. Is line wrapping really going to be that unpredictable? > > > On Mon, 24 Jan 2011, Philip J盲genstedt wrote: >> >> My main point here is that the use cases are so marginal. If there were >> more compelling ones, it's not hard to support intra-cue language >> settings using syntax like <lang en>bla</lang> or similar. > > Indeed. > > > On Mon, 24 Jan 2011, Glenn Maynard wrote: >> >> Here's one that I think was done very well, rendered statically to make >> sure we're all seeing the same thing: >> >> http://zewt.org/~glenn/multiple%20conversation%20example.mpg >> >> The results are pretty straightforward. One always stays on top, one >> always stays on the bottom, and most of the time the spacing between the >> two is correct--the normal distance the UA uses between two vertical >> captions (which would be lost by specifying the line height explicitly). >> Combined with the separate coloring (which is already possible, of >> course), it's possible to read both conversations and intuitively track >> which is which, and it's also very easy to just pick one or the other to >> read. > > As far as I can tell, the WebVTT algorithm would handle this case pretty > well. > > >> One example of how this can be tricky: at 0:17, a caption on the bottom >> wraps and takes two lines, which then pushes the line at 0:19 upward >> (that part's simple enough). If instead the top part had appeared >> first, the renderer would need to figure out in advance to push it >> upwards, to make space for the two-line caption underneith it. >> Otherwise, the captions would be forced to switch places. > > Right, without lookahead I don't know how you'd solve it. With lookahead > things get pretty dicey pretty quickly. > > > On Mon, 24 Jan 2011, Tab Atkins Jr. wrote: >> >> Right now, the WebVTT spec handles this by writing the text in white on >> top of a partially-transparent black background. The text thus never >> has contrast troubles, at the cost of a dark block covering up part of >> the display. >> >> Stroking text is easy, though. Webkit has an experimental property for >> doing it directly. Using existing CSS, it's easy to adapt text-shadow >> to produce a good outline - just make four shadows, offset by 1px in >> each direction, and you're good. > > WebVTT allows both text-shadow and text-outline. > > > On Wed, 9 Feb 2011, Silvia Pfeiffer wrote: >> >> We're trying to avoid the need for multiple transcodings and are trying >> to achieve something like the following pipeline: broadcast captions -> >> transcode to WebVTT -> show in browser -> transcode to broadcast devices >> -> show > > Why not just do: > > broadcast captions -> transcode to WebVTT -> show in browser > > ...for browsers and: > > broadcast captions -> show > > ...for legacy broadcast devices? > > > In any case the amount of legacy broadcast captions pales in comparison to > the volume of new captions we will see for the Web. I'm not really > convinced that legacy broadcast captions are an important concern here. > > >> What is the argument against using <u> in captions? > > What is the argument _for_ using <u> in captions? We don't add features > due to a lack of reasons not to. We add features due to a plethora of > reasons to do so. > > >> > [ foolip suggests using multiple cues to do blinking ] >> >> But from a captioning/subtitling point of view it's probably hard to >> convert that back to blinking text, since we've just lost the semantic >> by ripping it into multiple cues (and every program would use different >> ways of doing this). > > I do not think round-tripping legacy broadcast captions through WebVTT is > an important use case. If that is something that we should support, then > we should first establish why it is an important use case, and then > reconsider WebVTT within that context, rather than adding features to > handle it piecemeal. > > >> I guess what we are discovering is that we can define the general format >> of WebVTT for the Web, but that there may be an additional need to >> provide minimum implementation needs (a "profile" if you want - as much >> as I hate this word). > > Personally I have nothing against the word "profile", but I do have > something against providing for "minimum implemenatation needs". > > Interoperability means everything works the same everywhere. > > >> [re versioning the file format] >> In a contract between a caption provider and a caption consumer (I am >> talking about companies here), the caption consumer will want to tell >> the caption provider what kind of features they expect the caption files >> to contain and features they want avoided. This links back to the >> earlier identified need for "profiles". This is actually probably >> something outside the scope of this group, but I am sure there is a need >> for such a feature, in particular if we want to keep the development of >> the WebVTT specification open for future extensions. > > I don't see why there would be a need for anything beyond "make sure it > works with deployed software", maybe with that being explicitly translated > to specific features and workarounds for known bugs, e.g. "you can use > ruby, but make sure you don't have timestamps out of order". > > This, however, has no correlation to versions of the format. > > > On Mon, 14 Feb 2011, Philip J盲genstedt wrote: >> > >> > [line wrapping] >> >> There's still plenty of room for improvements in line wrapping, though. >> It seems to me that the main reason that people line wrap captions >> manually is to avoid getting two lines of very different length, as that >> looks quite unbalanced. There's no way to make that happen with CSS, and >> AFAIK it's not done by the WebVTT rendering spec either. > > WebVTT just defers to CSS for this. I agree that it would be nice for CSS > to allow UAs to do more clever things here and (more importantly) for UAs > to actually do more clever things here. > > > On Tue, 15 Feb 2011, Silvia Pfeiffer wrote: >> foolip wrote: >> > >> > Sure, it's already handled by the current parsing spec, since it >> > ignores everything up to the first blank line. >> >> That's not quite how I'm reading the spec. >> >> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#webvtt-0 >> allows >> "Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER >> TABULATION (tab) character followed by any number of characters that >> are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) >> characters." >> after the "WEBVTT FILE" magic. >> To me that reads like all of the extra stuff has to be on the same line. >> I'd prefer if this read "any character except for two WebVTT line >> terminators", then it would all be ready for such header-style >> metadata. > > That's the syntax rules. It's not the parser. > > >> I'm told <u> is fairly common in traditional captions. > > I've never seen it. Do you have any data on this? > > >> > Personally, I think we're going to see more and more devices running >> > full browsers with webfonts support, and that this isn't going to be a >> > big problem. >> >> I tend to agree and in fact I see that as the shiny future. Just not >> quite yet. > > We're not quite at WebVTT yet either. Currently, there's more support for > WebFonts than WebVTT. > > > On Tue, 15 Feb 2011, Glenn Maynard wrote: >> >> I think that, no matter what you do, people will insert line breaks in >> cues. I'd follow the HTML model here: convert newlines to spaces and >> have a separate, explicit line break like <br> if needed, so people >> don't manually line-break unless they actually mean to. > > The line-breaks-are-line-breaks feature is one of the features that > originally made SRT seem like a good idea. It still seems like the neatest > way of having a line break. > > >> Related to line breaking, should there be an escape? Inserting >> nbsp literally into files is somewhat annoying for authoring, since >> they're indistinguishable from regular spaces. > > How common would be? > > > On Thu, 10 Feb 2011, Silvia Pfeiffer wrote: >> >> Further discussions at Google indicate that it would be nice to make >> more components optional. Can we have something like this: >> >> [[h*:]mm:]ss[.[d[c[m]]] | s*[.d[c[m]]] >> >> Examples: >> 23 = 23 seconds >> 23.2 = 23 sec, 1 decisec >> 1:23.45 = 1 min, 23 sec, 45 centisec >> 123.456 = 123 sec, 456 millisec > > Currently the syntax is [h*:]mm:ss.sss; what's the advantage of making > this more complicated? It's not like most subtitled clips will be shorter > than a minute. Also, why would we want to support multiple redundant ways > of expressing the same time? (e.g. 01:00.000 and 60.000) > > Readability of VTT files seems like it would be helped by consistency, > which suggests using the same format everywhere, as much as possible. > > > On Sun, 16 Jan 2011, Mark Watson wrote: >> >> I have been looking at how the video element might work in an adaptive >> streaming context where the available media are specified with some kind >> of manifest file (e.g. MPEG DASH Media Presentation Description) rather >> than in HTML. >> >> In this context there may be choices available as to what to present, >> many but not all related to accessibility: >> >> - multiple audio languages >> - text tracks in multiple languages >> - audio description of video >> - video with open captions (in various languages) >> - video with sign language >> - audio with directors commentary >> - etc. >> >> It seems natural that for text tracks, loading the manifest could cause >> the video element to be populated with associated <track> elements, >> allowing the application to discover the choices and activate/deactivate >> the tracks. > > Not literal <track> elements, hopefully, but in-band text tracks (known as > "media-resource-specific text track" in the spec). > > >> But this seems just for text tracks. I know discussions are underway on >> what to do for other media types, but my question is whether it would be >> better to have a consistent solution for selection amongst the available >> media that applies for all media types ? > > They're pretty different from each other, so I don't know that one > solution would make sense for all of these. > > Does the current solution (the videoTracks, audioTracks, and textTracks > attributes) adequately address your concern? > > > On Mon, 17 Jan 2011, Jeroen Wijering wrote: >> >> We are getting some questions from JW Player users that HTML5 video is >> quite wasteful on bandwidth for longer videos (think 10min+). This >> because browsers download the entire movie once playback starts, >> regardless of whether a user pauses the player. If throttling is used, >> it seems very conservative, which means a lot of unwatched video is in >> the buffer when a user unloads a video. >> >> I did a simple test with a 10 minute video: playing it; pausing after 30 >> seconds and checking download progress after another 30 seconds. With >> all browsers (Firefox 4, Safari 5, Chrome 8, Opera 11, iOS 4.2), the >> video would indeed be fully downloaded after 60 seconds. Some throttling >> seems to be applied by Safari / iOS, but this could also be bandwidth >> fluctuations on my side. Either way, all browsers downloaded the 10min >> video while only 30 seconds were being watched. >> >> The HTML5 spec is a bit generic on this topic, allowing mechanisms such >> as stalling and throttling but not requiring them, or prescribing a >> scripting interface: >> >> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource > > Right, this is an area that is left up to implementations; a quality of > implementation issue. > > >> A suggestion would be to implement / expose a property called >> "downloadBufferTarget". It would be the amount of video in seconds the >> browser tries to keep in the download buffer. > > Wouldn't this be very situation-specific? e.g. if I know I'm about to go > into a tunnel for five minutes, I want five minutes of buffered data. If > my connection has a high packet loss rate and could stall for upwards of > 10 seconds, I want way more than 10 seconds in my buffer. If my connection > is such that I can't download data in realtime, I want the whole video in > my buffer. If my connection is such that I have 8ms latency to the video > server and enough bandwidth to transfer the whole four hour file in 3 > seconds, then really I don't need anything in my buffer. > > > On Mon, 17 Jan 2011, Roger H錱ensen wrote: >> On 2011-01-17 18:36, Markus Ernst wrote: >> > >> > Could this be done at the user side, e.g. with some browser setting? >> > Or even by a "stop downloading" control in the player? An intuitive >> > user control would be separate stop and pause buttons, as we know them >> > from tape and CD players. Pause would then behave as it does now, >> > while stop would cancel downloading. >> >> I think that's the right way to do it, this should be in the hands of >> the user and exposed as a preference in the browsers. > > Agreed. > > >> Although exposing (read only?) the user's preferred buffer setting to >> the HTML App/Plugin etc. would be a benefit I guess as the desired >> buffering could be communicated back to the streaming server for example >> for a better bandwidth utilization. > > How would the information be used? > > > On Mon, 17 Jan 2011, Zachary Ozer wrote: >> >> What no one has mentioned so far is that the real issue isn't the >> network utilization or the memory capacity of the devices, it's >> bandwidth cost. >> >> The big issue for publishers is that they're incurring higher costs when >> using the <video> tag, which is a disincentive for adoption. >> >> Since there are situations where both the publisher and the user are >> potentially incurring bandwidth costs (or have other limitations), we >> could allow the publisher to specify downloadBufferTarget and the user >> to specify a setting in the browser's config. The browser would then >> actually buffer min(user setting, downloadBufferTarget). At that point >> there would probably need to be another read-only property that >> specified what value the browser is currently using as it's buffer >> length, but maybe the getter for downloadBufferTarget is sufficient. > > I think before we get something that elaborate set up, we should just try > getting preload="" implemented. :-) That might be sufficent. > > > On Tue, 18 Jan 2011, Robert O'Callahan wrote: >> >> One solution that could work here is to honour dynamic changes to >> 'preload', so switching preload to 'none' would stop buffering. Then a >> script could do that, for example, after the user has paused the video >> for ten seconds. The script could also look at 'buffered' to make its >> decision. > > If browsers want to do that I'm quite happy to add something explicitly to > that effect to the spec. Right now the spec doesn't disallow it. > > > On Wed, 19 Jan 2011, Philip J盲genstedt wrote: >> >> The only difference between preload=none and preload=metadata is how >> much is fetched if the user doesn't interact at all with the video. Once >> the user has begun playing, I think the two mean the same thing: "please >> don't waste my bandwidth more than necessary". In other words, I think >> that for preload=metadata, browsers should be somewhat conservative even >> after playback has begun, not going all the way to the preload=auto >> behavior. > > The descriptions are somewhat loose, but something like this could work, > yes. (Though I'd say after playing preload=metadata and preload=auto are > the same and preload=none is the one that says to avoid bandwidth usage, > but that's just an artifact of the way I wrote the descriptions.) > > > On Tue, 18 Jan 2011, Zachary Ozer wrote: >> >> Currently, there's no way to stop / limit the browser from buffering - >> once you hit play, you start downloading and don't stop until the >> resource is completely loaded. This is largely the same as Flash, save >> the fact that some browsers don't respect the preload attribute. (Side >> note: I also haven't found a browser that stops loading the resource >> even if you destroy the video tag.) >> >> There have been a few suggestions for how to deal with this, but most >> have revolved around using downloadBufferTarget - a settable property >> that determines how much video to buffer ahead in seconds. Originally, >> it was suggested that the content producers should have control over >> this, but most seem to favor the client retaining some control since >> they are the most likely to be in low bandwidth situations. (Publishers >> who want strict bandwidth control could use a more advanced server and >> communication layer ala YouTube). >> >> The simplest enhancement would be to honor the downloadBufferTarget only >> when readyState=HAVE_ENOUGH_DATA and playback is paused, as this would >> imply that there is not a low bandwidth situation. > > It seems the simplest enhancement would be to have the browsers do the > right thing (e.g. download enough to get to HAVE_ENOUGH_DATA and stop if > the video is paused, or some such), not to add a feature that all Web > authors would have to handle. > > > On Tue, 18 Jan 2011, Boris Zbarsky wrote: >> >> In general, depending on finalizers to release resources (which is >> what's happening here) is not really a workable setup. Maybe we need an >> api to explicitly release the data on an audio/video tag? > > The spec suggests removing the element's src="" attribute and <source> > elements and then calling the element's load() method. > > The spec also suggests that implementors release all resources used by a > media element when that media element is an orphan when the event loop > spins. > > See the "Best practices for authors using media elements" and "Best > practices for implementors of media elements" sections. > > > On Wed, 19 Jan 2011, Andy Berkheimer wrote: >> >> In the case where the viewer does not have enough bandwidth to stream >> the video in realtime, there are two basic options for the experience: >> - buffer the majority of the video (per Glenn and Boris' discussion) >> - switch to a lower bitrate that can be streamed in realtime >> >> This thread has focused primarily of the first option and this is an >> experience that we see quite a bit. This is the option favored amongst >> enthusiasts and power users, and also makes sense when a viewer has made >> a purchase with an expectation of quality. And there's always the >> possibility that the user does not have enough bandwidth for even the >> lowest available bitrate. >> >> But the second option is the experience that the majority of our viewers >> expect. >> >> The ideal interface would have a reasonable default behavior but give an >> application the ability to implement either experience depending on user >> preference (or lack thereof), viewing context, etc. > > Agreed. This is the kind of thing that a good streaming protocol can > negotiate in realtime. > > >> I believe Chrome's current implementation _does_ stall the HTTP >> connection (stop reading from the socket interface but keep it open) >> after some amount of readahead - a magic hardcoded constant. We've run >> into issues there - their browser readahead buffer is too small and >> causing a lot of underruns. > > It's early days. File bugs! > > >> No matter how much data you pass between client and server, there's >> always some useful playback state that the client knows and the server >> does not - or the server's view of the state is stale. This is >> particularly true if there's an HTTP proxy between the user agent and >> the server. Any behavior that could be implemented through an advanced >> server/communication layer can be achieved in a simpler, more robust >> fashion with a solid buffer management implementation that provides >> "advanced" control through javascript and attributes. > > The main difference is that a protocol will typically be implemented a few > times by experienced programmers writing servers and clients, which will > then be deployed and used by less experienced (in this kind of thing) Web > developers, while if we just expose it to JavaScript, the people > implementing it will be a combination of experienced library authors and > those same Web developers, and the result will likely be less successful. > > However, the two aren't mutually exclusive. We could do one and then later > (or at the same time) do the other. > > > On Tue, 18 Jan 2011, Roger H氓gensen wrote: >> >> It may sound odd but in low storage space situations, it may be >> necessary to unbuffer what has been played. Is this supported at all >> currently? > > Yes. > > >> I think that the buffering should basically be a "moving window" (I hope >> most here are familiar with this term?), and that the size of the moving >> window should be determined by storage space and bandwidth and browser >> preference and server preference, plus make sure the window supports >> skipping anywhere without needing to buffer up to it, and avoid >> buffering from the start just because the user skipped back a little to >> catch something they missed (another annoyance). This is the only >> logical way to do this really. Especially since HTTP 1.1 has byterange >> support there is nothing preventing it from being implemented, and I >> assume other popular streaming protocols supports byterange as well? > > Implementations are allowed to do that. > > > On Tue, 18 Jan 2011, Silvia Pfeiffer wrote: >> >> I think that's indeed one obvious improvement, i.e. when going to pause >> stat, stop buffering when readyState=HAVE_ENOUGH_DATA (i.e. we have >> reached canplaythrough state). > > The spec allows this already. > > >> However, again, I don't think that's sufficient. Because we will also >> buffer during playback and it is possible that we buffer fast enough to >> have buffered e.g. the whole of a 10min video by the time we hit pause >> after 1 min and stop watching. That's far beyond canplaythrough and >> that's 9min worth of video download wasted bandwidth. This is where the >> suggested downloadBufferTarget would make sense. It would basically >> specify how much more to download beyond HAVE_ENOUGH_DATA before pausing >> the download. > > I don't understand how a site can know what the right value is for this. > Users aren't going to understand that they have to control the buffering > if (e.g.) they're about to go into a tunnel and they want to make sure > it's buffered all the way through. It should just work, IMHO. > > > On Tue, 18 Jan 2011, David Singer wrote: >> >> If you want a more tightly coupled supply/consume protocol, then use >> one. As long as it's implemented by client and server, you're on. >> >> Note that the current move of the web towards download in general and >> HTTP in particular is due in no small part to the fact that getting more >> tightly coupled protocols -- actually, any protocol other than HTTP -- >> out of content servers, across firewalls, through NATs, and into clients >> is...still a nightmare. So, we've been given a strong incentive by all >> those to use HTTP. It's sad that some of them are not happy with that >> result, but it's going to be hard to change now. > > Agreed, though in practice there are certainly ways to get two-way > protocols through. WebSocket does a pretty good job, for example. But > designing a protocol for this is out of scope for this list, really. > > > On Tue, 18 Jan 2011, David Singer wrote: >> >> In RTSP-controlled RTP, there is a tight relationship between the play >> point, and play state, the protocol state (delivering data or paused) >> and the data delivered (it is delivered in precisely real-time, and >> played and discarded shortly after playing). The server delivers very >> little more data than is actually watched. >> >> In HTTP, however, the entire resource is offered to the client, and >> there is no protocol to convey play/paused back to the server, and the >> typical behavior when offered a resource in HTTP is to make a simple >> binary decision to either load it (all) or not load it (at all). So, by >> providing a media resource over HTTP, the server should kinda be >> expecting this 'download' behavior. >> >> Not only that, but if my client downloads as much as possible as soon as >> possible and caches as much as possible, and yours downloads as little >> as possible as late as possible, you may get brownie points from the >> server owner, but I get brownie points from my local user -- the person >> I want to please if I am a browser vendor. There is every incentive to >> be resilient and 'burn' bandwidth to achieve a better user experience. >> >> Servers are at liberty to apply a 'throttle' to the supply, of course >> ("download as fast as you like at first, but after a while I'll only >> supply at roughly the media rate"). They can suggest that the client be >> a little less aggressive in buffering, but it's easily ignored and the >> incentive is to ignore it. >> >> So I tend to return to "if you want more tightly-coupled behavior, use a >> more tightly-coupled protocol"... > > Indeed. > > > On Wed, 19 Jan 2011, Philip J盲genstedt wrote: >> >> The 3 preload states imply 3 simple buffering strategies: >> >> none: don't touch the network at all >> preload: buffer as little as possible while still reaching readyState >> HAVE_METADATA >> auto: buffer as fast and much as possible > > "auto" isn't "as fast and much as possible", it's "as fast and much as > will make the user happy". In some configurations, it might be the same as > "none" (e.g. if the user is paying by the byte and hates video). > > >> However, the state we're discussing is when the user has begun playing the >> video. The spec doesn't talk about it, but I call it: >> >> invoked: buffer as little as possible without readyState dropping below >> HAVE_FUTURE_DATA (in other words: being able to play from currentTime to >> duration at playbackRate without waiting for the network) > > There's also a fifth state, let's call it "aggressive", where even while > playing the video the UA is trying to download the whole thing in case the > connection drops. > > >> If the available bandwidth exceeds the bandwidth of the resource, some >> kind of throttling must eventually be used. There are mainly 2 options >> for doing this: >> >> 1. Throttle at the TCP level by not reading data from the socket (not at all >> to suspend, or at a controlled rate to buffer ahead) >> 2. Use HTTP byte ranges, making many smaller requests with any kind of >> throttling at the TCP level > > There's also option 3, to handle the fifth state above: don't throttle. > > >> When HTTP byte ranges are used to achieve bandwidth management, it's >> hard to talk about a single downloadBufferTarget that is the number of >> seconds buffered ahead. Rather, there might be an upper and lower limit >> within which the browser tries to stay, so that each request can be of a >> reasonable size. Neither an author-provided minumum or maximum value can >> be followed particularly closely, but could possibly be taken as a hint >> of some sort. > > Would it be a more useful hint than "preload"? I'm skeptical about adding > many hints with no requirements. If there's some specific further > information we can add, though, it might make sense to add more features > to "preload". > > >> The above buffering strategies are still not enough, because users seem >> to expect that in a low-bandwidth situation, the video will keep >> buffering until they can watch it through to the end. These seem to be >> the options for solving the problem: >> >> * Make sites that want this behavior set .preload='auto' in the 'paused' >> event handler >> >> * Add an option in the context menu to "Preload Video" or some such >> >> * Cause an invoked (see dfn above) but paused video to behave like >> preload=auto >> >> * As above, but only when the available bandwidth is limited >> >> I don't think any of these solutions are particularly good, so any input >> on other options is very welcome! > > If users expect something, it seems logical that it should just happen. I > don't have a problem with saying that it should depend on preload="", > though. If you like I can make the spec explicitly describe what the > preload="" hints mean while video is playing, too. > > > On Wed, 19 Jan 2011, Zachary Ozer wrote: >> >> What if, instead of trying to solve this problem, we leave it up to the >> publishers. The current behavior would be unchanged, but we could add >> explicit bandwidth management API calls, ie startBuffer() and >> stopBuffer(). This would let developers / site publishers control how >> much to buffer and when. > > We couldn't depend on it (most people presumably won't want to do anything > but give the src="" of their video). > > >> We might also consider leaning on users a bit to tell us what they want. >> For example, I think people are pretty used to hitting play and then >> pause to buffer until the end of the video. What if we just used our >> bandwidth heuristics while in the play state, and buffered blindly when >> a pause occurs less than X seconds into a video? I won't argue that this >> is a wonderful solution (or a habit we should encourage), but I figured >> I'd throw a random idea out there鈥� > That seems like pretty ugly UI. :-) > > > On Thu, 20 Jan 2011, Glenn Maynard wrote: >> >> I think that pausing shouldn't affect read-ahead buffering behavior. >> I'd suggest another preload value, preload=buffer, sitting between >> "metadata" and "auto". In addition to everything loaded by "metadata", >> it also fills the read-ahead buffer (whether the video is playing or >> not). >> >> - If a page wants prebuffering only (not full preloading), it sets >> preload=buffer. This can be done even when the video is paused, so when >> the user presses play, the video starts instantly without pausing for a >> server round-trip like preload=metadata. > > So this would be to buffer enough to play through assuming the network > remains at the current bandwidth, but no more? > > >> - If a page wants prebuffering while playing, but unlimited buffering when >> paused (per Zachary's suggestion), it sets preload=buffer when playing and >> preload=auto when paused. > > Again, note that "auto" doesn't mean "buffer everything", it means "do > whatever is best for the user". > > I don't mind adding new values if the browser vendors are going to use > them. > > > On Sat, 22 Jan 2011, David Singer wrote: >> >> When the HTML5 states were first proposed, I went through a careful >> exercise to make sure that they were reasonably delivery-technology >> neutral, i.e. that they applied equally well if say RTSP/RTP was used, >> some kind of dynamic streaming, simple HTTP, and so on. >> >> I am concerned that we all tend to assume that HTML==HTTP, but the >> source URL for the media might have any protocol type, and the HTML >> attributes, states etc. should apply (or clearly not apply) to anything. >> >> Assuming only HTTP, in the markup, is not a good direction. > > Agreed. > > > On Thu, 20 Jan 2011, Matthew Gregan wrote: >> >> The media seek algorithm (4.8.10.9) states that the current playback >> position should be set to the new playback position during the >> asynchronous part of the algorithm, just before the seeking event is >> fired. [...] > > On Thu, 20 Jan 2011, Philip J盲genstedt wrote: >> >> There have been two non-trivial changes to the seeking algorithm in the >> last year: >> >> Discussed at http://lists.w3.org/Archives/Public/public-html/2010Feb/0003.html >> lead to http://html5.org/r/4868 >> >> Discussed at http://lists.w3.org/Archives/Public/public-html/2010Jul/0217.html >> lead to http://html5.org/r/5219 > > Yeah. In particular, sometimes there's no way for the UA to know > asynchronously if the seek can be done, which is why the attribute is set > after the method returns. It's not ideal, but the alternative is not > always implementable. > > >> With that said, it seems like there's nothing that guarantees that the >> asynchronous section doesn't start running while the script is still >> running. > > Yeah. It's not ideal, but I don't really see what we can do about it. > > >> It's also odd that currentTime is updated before the seek has actually >> been completed, but the reason for this is that the UI should show the >> new position. > > Not just the UI. The current position is what the browser is trying to > play; if the current position didn't move, then the browser wouldn't be > trying to play it. > > > On Fri, 4 Feb 2011, Matthew Gregan wrote: >> >> For anyone following along, the behaviour has now been changed in the >> Firefox 4 nightly builds. > > On Mon, 24 Jan 2011, Robert O'Callahan wrote: >> >> I agree. I think we should change behavior to match author expectations >> and the other implementations, and let the spec change to match. > > How do you handle the cases where it's not possible? > > > If all the browsers can do it, I'm all for going back to having > currentTime change synchronosuly. > > > On Sat, 29 Jan 2011, Lubomir Toshev wrote: >> >> [W]hen the video tag has embedded browser controls displayed and I click >> anywhere on the controls, they cause a video tag click event. If I want >> to toggle play/pause on video area click, then I cannot do this, because >> clicking on the play control button, fires play, then click event fires >> for video tag and when I toggle It pauses. So this behavior that every >> popular flash player has cannot be achieved. There is no way to >> understand that the click.target is the embedded browser controls area. >> I think that a nice improvement will be to expose this information, in >> the target, that it actually is embedded browser controls. Or clicking >> the embedded browser controls should not produce a click event for video >> tag. After all browser controls are native and do not have >> representation in the DOM. Let me know what do you think about this? > > On Sat, 29 Jan 2011, Aryeh Gregor wrote: >> >> Well, to begin with, you could just use your own controls rather than >> the browser's built-in controls. Then you have no problem. If you're >> using the browser's built-in controls, maybe you should stick with the >> browser's control conventions throughout, which presumably doesn't >> include toggling play/pause on click. >> >> I'm not sure this is a broad enough problem to warrant exposing the >> extra information in the target. Are there any other use-cases for such >> info? > > On Sun, 30 Jan 2011, Lubomir Toshev wrote: >> >> To elaborate a bit, I'm a control developer and I have my own custom >> controls. But we want to allow for the customer to use the default >> browser controls if they want to. This can be done by switching an >> option in my jQuery widget - browserControls - true/false. Or through >> browser context menu shown by default on right click. So I'm trying to >> be flexible enough for the customer. >> >> I was thinking about this >> 1) that adding a transparent overlay over the browser controls >> Or >> 2) to detect the click position and if it is some pixels away from the >> bottom of the video tag >> >> will fix this, but every browser has different height for its embedded >> controls and I should hardcode this height in my code, which is just not >> manageable. >> >> I can always add a limitation when using browser controls, toggle >> play/pause on video area click will be turned off, but I want to achieve >> similar behavior in all the browsers no matter whether they use embedded >> controls or not. >> >> So I think this tiny click.target thing will be very useful. > > On Sun, 30 Jan 2011, Glenn Maynard wrote: >> >> Even as a bad hack it's simply not possible; for example, there's no way >> to tell whether a pop-out volume control is open or not. >> >> I think the primary use case browser controls are meant for is when >> scripting isn't available at all. They aren't very useful when you're >> using any kind of scripts with the video. Another problem, related to >> your other post about captioning, is that it's impossible to put >> anything between the video and the controls, so your captions will draw >> *on top of* browser controls. > > On Mon, 31 Jan 2011, Simon Pieters wrote: >> >> See http://lists.w3.org/Archives/Public/public-html/2009Jun/0395.html >> >> I suggested that the browser would not generate an event at all when >> using the native controls. Seemingly there was no reply to Hixie's >> request for opinion from other implementors. > > On Mon, 31 Jan 2011, Glenn Maynard wrote: >> >> There are other meaningful ways to respond to these events; for example, >> to pull its container to the top of the draw order if it's a floating >> window. I should be able to capture mousedown on the container to do >> this, regardless of content. > > On Mon, 31 Jan 2011, Simon Pieters wrote: >> >> How about just suppressing activation events like click? > > On Mon, 31 Jan 2011, Glenn Maynard wrote: >> >> That makes more sense than suppressing the entire mousedown/mouseup >> events (and keydown, touchstart, etc). >> >> Also, it means you can completely emulate the event behavior of the >> default browser controls with scripts: preventDefault on mousedown to >> prevent click events. That's probably not what you actually want to do, >> but it means the default controls aren't doing anything special: their >> effect on events can be understood entirely in terms of what scripted >> events can already do. > > On Mon, 31 Jan 2011, Lubomir Toshev wrote: >> >> I totally agree that events should not be raised, when they originate >> from the native browser controls. This would make it much simpler. I >> filed the same bug for Opera 11 last week. > > As with the post Simon cites above, I'm happy to do this kind of thing, if > multiple vendors agree that it makes sense. If you would like this to be > done, I recommend getting other browser vendors to tell me it sounds good! > > > On Sat, 29 Jan 2011, Lubomir Toshev wrote: >> >> [V]ideo should expose API for currentFrame, so that when control >> developers want to add support for subtitles on their own, to be able to >> support formats that display the subtitles according to the current >> video frame. This is a limitation to the current design of the video >> tag. > > On Sun, 30 Jan 2011, Lubomir Toshev wrote: >> >> We were trying to add support for subtitles for our player control that >> uses video tag as its base. There are two popular subtitle formats *.srt >> which uses currentTime to show the subtitles where they should be. Like >> 0:01:00 - 0:01:30 - "What a nice hotel." While the other popular format >> is *.sub which uses the currentFrame to show the proper subtitles. Like >> {45600}, {45689} - "What a nice hotel". And if I want to add this >> support it would be good if video tag exposes currentFrame, so that I >> can show properly the subtitles in a span positioned over the video. Now >> does it make more sense? >> >> I know video will have embedded subtitle support, but I think that it >> should be flexible enough to allow building such features like the one >> above. What do you think? To me this is worth adding because, it should >> be really easy to implement? > > We'll probably add that along with the metrics, when we add those, if > there's a strong use case for it. I'm not sure that supporting frame-based > subtitles is a good use case though. > > > On Mon, 14 Feb 2011, David Flanagan wrote: >> >> The draft specification defines 20+ medial event handler IDL attributes >> on HTMLElement. These events are non-bubbling and are always targeted >> at <audio> and <video> tags, so I wonder if they wouldn't be better >> defined on HTMLMediaElement instead. > > All event handlers are on HTMLElement, to make implementations easier and > to make it the platform simpler. > > > On Tue, 15 Feb 2011, David Flanagan wrote: >> >> Fair enough, though I do think it will confuse developers who will think >> that those media events bubble. (I'll be documenting them as properties >> of HTMLMediaElement). > > Whether an event bubbles or not is up to the place that dispatches the > event, not the place that hears the event. > > >> What about Document and Window? What's the justification for defining >> the media event handler attributes on those objects? > > Same. It allows the same logic to be used everywhere. > > > On Mon, 14 Feb 2011, Kevin Marks wrote: >> On Mon, Feb 14, 2011 at 2:39 PM, Ian Hickson <ian@hixie.ch> wrote: >> > On Fri, 19 Nov 2010, Per-Erik Brodin wrote: >> > > >> > > We are about to start implementing stream.record() and >> > > StreamRecorder. The spec currently says that 鈥渢he file must be in >> > > a format supported by the user agent for use in audio and video >> > > elements鈥�which is a reasonable restriction. However, there is >> > > currently no way to set the output format of the resulting File that >> > > you get from recorder.stop(). It is unlikely that specifying a >> > > default format would be sufficient if you in addition to container >> > > formats and codecs consider resolution, color depth, frame rate etc. >> > > for video and sample size and rate, number of channels etc. for >> > > audio. >> > > >> > > Perhaps an argument should be added to record() that specifies the >> > > output format from StreamRecorder as a MIME type with parameters? >> > > Since record() should probably throw when an unsupported type is >> > > supplied, it would perhaps be useful to have a canRecordType() or >> > > similar to be able to test for supported formats. >> > >> > I haven't added anything here yet, mostly because I've no idea what to >> > add. The ideal situation here is that we have one codec that everyone >> > can read and write and so don't need anything, but that may be >> > hopelessly optimistic. >> >> That isn't the ideal, as it locks us into the current state of the art >> forever. The ideal is to enable multiple codecs +formats that can be >> swapped out over time. That said, uncompressed audio is readily >> codifiable, and we could pick a common file format, sample rate, >> bitdepth and channel caount specification. > > It doesn't lock us in to one format, we can always add more formats later. > Right now, we have zero formats, so one format would be a huge step up. > > > On Fri, 4 Mar 2011, Philip J盲genstedt wrote: >> On Thu, 03 Mar 2011 22:15:58 +0100, Aaron Colwell <acolwell@google.com> >> wrote: >> > >> > I was looking at the resource fetch >> > algorithm<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource>section >> > and fetching resources >> > <http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#fetch> >> > sections of the HTML5 spec to determine what the proper behavior is >> > for handling redirects. Both YouTube and Vimeo do 302 redirects to >> > different hostnames from the URLs specified in the src attribute. It >> > looks like the spec says that playback should fail in these cases >> > because they are from different origins (Section 2.7 Fetching >> > resources bullet 7). This leads me to a few questions. >> > >> > 1. Is my interpretation of the spec correct? Sample YouTube & Vimeo URLs are >> > shown below. >> > YouTube : src : http://v22.lscache6.c.youtube.com/videoplayback? ... >> > redirect : http://tc.v22.cache6.c.youtube.com/videoplayback? >> > ... >> > >> > Vimeo : src : http://player.vimeo.com/play_redirect? ... >> > redirect : http://av.vimeo.com/05 ... >> >> Yes, from what I can tell you're correct, but I think it's not >> intentional. The behavior was changed by <http://html5.org/r/5111> in >> 2010-06-25, and this is the first time I've noticed it. Opera (and I >> assume most if not all other browsers) already supports HTTP redirects >> for <video> and I don't think it makes much sense to disallow it. For >> security purposes, the origin of the resource is considered to be the >> final destination, not any of the origins in the redirect chain. > > This was fixed recently. > > > On Fri, 18 Mar 2011, Eric Winkelman wrote: >> >> For in-band metadata tracks, there is neither a standard way to >> represent the type of metadata in the HTMLTrackElement interface nor is >> there a standard way to represent multiple different types of metadata >> tracks. > > There can be a standard way. The idea is that all the types of metadata > tracks that browsers will support should be specified so that all browsers > can map them the same way. I'm happy to work with anyone interested in > writing such a mapping spec, just let me know. > > >> Proposal: >> >> For TimedTextTracks with kind=metadata the @label attribute should >> contain a MIME type for the metadata and that a track only contain Cues >> created from metadata of that MIME type. >> >> This implies that streams with multiple types of metadata require the >> creation of multiple metadata track objects, one for each MIME type. > > This might make sense if we had a defined way of getting such a MIME type > (and assuming you're talking about the IDL attributes, not the content > attributes). > > > On Tue, 22 Mar 2011, Eric Winkelman wrote: >> >> Ah, yes, now I understand the confusion. Within the whatwg specs, the >> word "attribute" is generally used and I was trying to be consistent. > > The WHATWG specs refer to content attributes (those on elements) and IDL > attributes (those on objects, which generate properties in JS). The @foo > syntax is never used in the WHATWG specs. It's usually used in a W3C > context just to refer to content attributes, by analogy to the XPath > syntax. (Personally I prefer foo="" since it's less ambiguous.) > > > On Mon, 21 Mar 2011, Eric Winkelman wrote: >> >> No, I'm not saying that, but as far as I can tell from the spec, it is >> undefined how the user agent should map in-band data to metadata tracks. >> I am proposing that the algorithm should be that different types of data >> should go into different Timed Text Tracks, and that the track's @label >> should reflect the type. > > To the extent that it is defined, it is defined here: > > http://www.whatwg.org/specs/web-apps/current-work/complete.html#sourcing-in-band-text-tracks > > But the theory, as mentioned above, is that specific types of in-band > metadata tracks would have explicit specs written to define how the > mapping is done. > > >> Recent updates to the spec, section 4.8.10.12.2 >> (http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#sourcing-in-band-text-tracks) >> appear to address my concern in step 2: >> >> "2. Set the new text track's kind, label, and language based on the >> semantics of the relevant data, as defined by the relevant >> specification." >> >> Provided that the relevant specification defines the metadata type >> encoding to be put in the label, e.g. application/x-eiss, >> application/x-scte35, application/x-contentadvisory, etc. > > Well the problem is that there typically is no applicable specification, > or that it is too vague. > > > On Tue, 22 Mar 2011, Lachlan Hunt wrote: >> >> This is regarding the recently added audioTracks and videoTracks APIs to >> the HTMLMediaElement. >> >> The design of these APIs seems to be done a little strangely, in that >> dealing with each track is done by passing an index to each method on >> the TrackList interfaces, rather than treating the audioTracks and >> videoTracks as collections of individual audio/video track objects. This >> design is inconsistent with the design of the TextTrack interface, and >> seems sub-optimal. > > It is intended to avoid an explosion of objects. TextTrack needs to be an > object because it has separate state, gets targetted for events, has > different versions (e.g. MutableTextTrack), etc. Audio and Video tracks > are, on the other hand, rather trivial constructs. > > >> The use of ExclusiveTrackList for videoTracks also seems rather >> limiting. What about cases where the second video track is a >> sign-language track, or some other video overlay. > > You use a separate <video> element. > > I considered this in some depth. The main problem is that you end up > having to define a layout mechanism for videos if you allow multiple > videos to be enabled from script (e.g. consider what the behaviour should > be if you enable the main video, then the PiP sign language video, then > disable the main video. What is the intrinsic dimension of the <video> > element? Does it matter if you do it in a different order?). > > By making <video> be a single video's output layer, we can bypass many of > these problems without removing expressibility (the author can still > support multiple PiP videos). > > >> There are also the use cases for controlling the volume of individual >> tracks that are not addressed by the current spec design. > > Can you elaborate on these use cases? > > My assumption has been that on the long term, i you want to manipulate > specific audio tracks, you would use an <audio> element and plug it into > the Audio API for separate processing. > > > On Sat, 2 Apr 2011, Bruce Lawson wrote: >> >> From a comment in a blog post of mine about longdesc >> (http://www.brucelawson.co.uk/2011/longdesc-in-html5/comment-page-1/#comment-749853) >> I'm wondering if this is an appropriate used of <details> >> >> <details> >> <summary> >> <img src=chart.png alt="Graph of percentage of total U.S. >> non-institutionalized population age 16-64 declaring one or more >> disabilities"> >> </summary> >> <p>The bar graph shows the percentage of total U.S. noninsitutionalized >> population age 16-64 declaring one or more disabilities. The percentage >> value for each category is as follows:</p> >> <ul> >> <li>Total declaring one or more >> disabilities: 18.6 percent </li> >> <li>Sensory (visual and hearing): 2.3 >> percent</li> >> <li>Physical: 6.2 percent</li> >> <li>Mental: 3.8 percent</li> >> <li>Self-care: 1.8 percent</li> >> <li>Diffuculty going outside the home: >> 6.4 percent</li> >> <li>Employment disability: 11.9 >> percent</li> >> </ul> >> <p>data retrieved from <a >> href="http://www.census.gov/prod/2003pubs/c2kbr-17.pdf" title="Link to >> External Site" class="external">2000 U.S. Census<span> - >> external link</span></a></p> >> </details> >> >> .. thereby acting as a discoverable-by-anyone longdesc. (The example is >> adapted from the longdesc example at >> http://webaim.org/techniques/images/longdesc#longdesc) >> >> Note to grumpy people: I'm not trying to advocate abolishing longdesc, >> just seeeing whether details can be used as an alternative. > > It's a bit weird, but sure. > > (Well, except for your alt="" text, which is a title="", not an alt="".) > > > On Sat, 2 Apr 2011, John Foliot wrote: >> >> Interesting question. Referring to the spec, I think that you may have >> in fact uncovered a bug in the text. The spec states: >> >> "The user agent should allow the user to request that the details >> be shown or hidden." >> >> The problem (or potential problem) here is that the behaviour is defined >> in visual terms - > > The spec explicitly says that these terms have non-visual meaning. > > > On Mon, 4 Apr 2011, Bjartur Thorlacius wrote: >> >> IMO, the specification of the <details> element is overly focused on >> expected renderings. Rather than explicitly defining the semantics of >> <details> with or without an @open attribute, and with or without a >> <summary> child, sane renderings for medium to large displays whith whom >> the user can interact are described, and usage is to be inferred >> therefrom. This is suboptimal, as it allows hiding <details open>s on >> small output windows but shoulds against it as strongly as ignoring >> addition of the open attribute. Note that the <details> element >> represents a disclosure widget, but the contents are nowhere defined >> (neither as additional information (that a user-agent may or may not >> render, depending on factors such as scarcity of screen estate), nor as >> spoiling information that shouldn't be provided to the user without >> explicit consent). I regard the two different use cases as different, >> even though vendors might implement both with { binding: details; } on >> some media. <Details> can't serve both. It's often spoken of as if >> intended for something else than the YouTube video description use case. >> <Details> mustn't be used for hiding spoilers, or else browsers won't be >> able to intelligently choose to render the would-be concealed contents. > > I've clarified <details> to be better defined in this respect. I hope it > addresses your concern. > > > On Fri, 22 Apr 2011, Dimitri Glazkov wrote: >> >> I wonder if it makes sense to introduce a set of pseudo-classes on the >> video/audio elements, each reflecting a state of the media on the >> controls (playing/paused/error/etc.)? Then, we could use just CSS to >> style media controls (whether native or custom), and not have to listen >> to DOM events just to tweak their appearance. > > On Sat, 23 Apr 2011, Philip J盲genstedt wrote: >> >> With a sufficiently large set of pseudo-classes it might be possible to >> do *display* most of the interesting state, but how would you *change* >> the state without using scripts? Play/pause, seek, volume, etc... > > On Sat, 23 Apr 2011, Dimitri Glazkov wrote: >> >> This is not the goal of using pseudo-classes: they just provide you with >> a uniform way to react to changes. > > On Sat, 23 Apr 2011, Philip J盲genstedt wrote: >> >> In other words, one would still have to rely heavily on scripts to >> actually implement custom controls? >> >> Also, how would one style a progress bar using pseudo-classes? How about >> a displaying elapsed/remaining time on the form MM:SS? > > On Sat, 23 Apr 2011, Dimitri Glazkov wrote: >> >> I am not in any way trying to invent a magical way to style media >> controls entirely in CSS. Just trying to make the job of controls >> developers easier and use CSS where it's well... useful? :) > > On Sat, 23 Apr 2011, Philip J盲genstedt wrote: >> >> Very well, what specific set pseudo-classes do you think would be >> useful? > > On Sat, 23 Apr 2011, Dimitri Glazkov wrote: >> >> I can infer what would be useful from WebKit's media controls as a first >> stab? > > On Mon, 25 Apr 2011, Silvia Pfeiffer wrote: >> >> A markup and CSS example would make things clearer. How do you think it >> would look? > > On Sun, 24 Apr 2011, Dimitri Glazkov wrote: >> >> Based on WebKit's current media controls, let's start with these pseudo-classes: >> >> Play state: >> - loading >> - playing >> - streaming >> - error >> >> Capabilities: >> - no-audio >> - no-video >> - has-closed-captioning >> >> So, to show a status message while the control is loading or streaming >> and hide when it's done: >> >> video -webkit-media-controls-status-display { >> display: none; >> } >> >> >> video:loading -webkit-media-controls-status-display, video:streaming >> -webkit-media-controls-status-display { >> display: initial; >> ... >> } >> >> Similarly, to hide volume controls when there's no audio: >> >> video:no-audio -webkit-media-controls-volume-slider-container { >> display: none; >> } >> >> Once I put these pseudo-classes in place for WebKit, a lot of the code in >> http://codesearch.google.com/codesearch/p#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/html/shadow/MediaControlRootElement.cpp&exact_package=chromium >> will go away, being replaced with straight CSS. > > Sounds to me like a poor man's XBL. I'd much rather see this addressed > using a full-on binding solution, since it seems like it would be only a > little more complex yet orders of magnitude more powerful. > > > On Fri, 13 May 2011, Narendra Sisodiya wrote: >> >> What i want is a general purpose synchronize mechanism when resource >> like (text, video, graphics, etc) will be played over a general purpose >> timer (timeline) with interaction.. >> >> Ex - >> >> <resource type="html" src="asd.html" x="50%" y="50%" width="10%" >> height="10%" z="6" xpath="page1" tIn="5000ms" tOut="9400ms" >> inEffect="fadein" outEffect="fadeout" inEffectDur="1000ms" >> outEffectDur="3000ms"/> >> >> <resource type="html" src="Indian.ogv" x="50%" y="50%" width="10%" >> height="10%" z="6" xpath="page2" tIn="5000ms" tOut="9400ms" >> inEffect="fadein" outEffect="fadeout" inEffectDur="1000ms" >> outEffectDur="3000ms"/> > > Sounds like SMIL. I recommend looking into SMIL and SVG (which includes > parts of SMIL). > > > On Fri, 13 May 2011, Philip J盲genstedt wrote: >> >> Problem: >> >> <video src="video.webm"></video> >> ... >> <script> >> document.querySelector('video').oncanplay = function() { >> /* will it run? */ >> }; >> </script> >> >> In the above the canplay event can be replaced with many others, like >> loadedmetadata and loadeddata. Whether or not the event handler has been >> registered by the time the event is fired depends on how fast decoding >> is, how fast the network is and how much "..." there is. > > Yes, if you add an event listener in a task that runs after the task that > fires the event could have run, you won't always catch the event. > > That's just a bug in the JS. > > > On Fri, 13 May 2011, Henri Sivonen wrote: >> >> <iframe src=foo.html></iframe> >> <script> >> document.querySelector('iframe').onload = function() { >> /* will it run? */ >> }; >> </script> >> has the same problem. The solution is using the onload markup attribute >> that calls a function declared in an earlier <script>: >> >> <script> >> function iframeLoaded() { >> /* It will run! */ >> } >> </script> >> <iframe src=foo.html onload=iframeLoaded()></iframe> > > Exactly. > > > On Sat, 14 May 2011, Ojan Vafai wrote: >> >> If someone proposed a workable solution, browser would likely implement >> it. I can't think of a backwards-compatible solution to this, so I agree >> that developers just need to learn the that this is a bad pattern. I >> could imagine browsers logging a warning to the console in these cases, >> but I worry that it would fire too much in today's web. > > Indeed. > > >> It's unfortunate that you need to use an inline event handler instead of >> one registered via addEventListener to avoid the race condition. >> Exposing something to the platform like jquery's live event handlers ( >> http://api.jquery.com/live/) could mitigate this problem in practice, >> e.g. it would be just as easy or easier to register the event handler >> before the element is created. > > You can also work around it by setting src="" from script after you've > used addEventListener, or by checking the state manually after you've > added the handler and calling the handler if it is too late (though you > have to be aware of the situation where the event is actually already > scheduled and you added the listener between the time it was scheduled and > the time it fired, so your function really has to be idempotent). > > > On Sun, 15 May 2011, Olli Pettay wrote: >> >> There is no need to use inline event handler. >> One can always add capturing listener to window for example. >> window.addEventListener("canplay", >> function(e) { >> if (e.target == document.querySelector('video') { >> // Do something. >> } >> } >> , true); >> And just do that before the <video> element occurs in the page. >> That is simple, IMHO. > > Indeed, that is another option. > > >> (I wonder why the "Firing a simple event named e" defaults to >> non-bubbling. It makes many things harder than they should be.) > > The default is arbitrary and doesn't affect the platform (since I have > to decide with each event whether to use the default or not). Changing the > default would make no difference (I'd just have to go to every site that > calls the algorithm and switch it from "bubbles" to nothing and nothing to > "does not bubble"). > > > On Sun, 15 May 2011, Glenn Maynard wrote: >> >> If a MediaController is being used it's more complicated; there seems to >> be no way to query the readyState of a MediaController (almost, but not >> quite, the "most recently reported readiness state"), or to get a list >> of slaved media elements from a MediaController without searching for >> them by hand. > > If you're scripting the MediaController, the assumption is that you > created it so there's no problem. The impled MediaControllers are for the > declarative case where you don't need scripting at all. > > > On Mon, 16 May 2011, Simon Pieters wrote: >> >> The state can have changed before the event has actually fired, since >> state changes are sync but the events are queued. So if the script >> happens to run in between then func is run twice. > > That's true. > > > On Mon, 16 May 2011, Remy Sharp wrote: >> >> Now you're right, whoever pointed out the 7am alarm example, if you >> attach the event too late, then you'll miss the boat. However, it's a >> chicken an egg situation. You don't have the DOM so you can't attach >> the event handler, and if you do have the DOM, the damn event has fired >> already. >> >> What's the fix? Well, the work arounds are certainly viable, again from >> an everyman developer point of view: >> >> 1) Attach higher up, on the window object and listen for the >> canplay/loadedmetadata/etc and check the event.target >> >> 2) Attach an inline event handler (not nice, but will do) >> >> The fix? Since ultimately we have exactly the same potential "bug" with >> image load events > > Not just those, also iframes, own document navigation, sockets, XHR, > anything that does asynchronous work, in fact. > > >> is to update the specification and make it clear: that depending on the >> speed of the connection and decoding, the following "xyz" events can >> fire **before** your script runs. Therefore, here's a couple of work >> arounds - or just be aware. > > I don't really know where to put this that would actually help. > > > On Tue, 17 May 2011, Philip J盲genstedt wrote: >> >> Still, I don't think just advocacy is any kind of solution. Given that >> you (the co-author of an HTML5 book) make certain assumptions about the >> outcome of this race condition, it's safe to assume that hoards of web >> developers will do the same. >> >> To target this specific pattern, one hypothetical solution would be to >> special-case the first script that attaches event handlers to a <video> >> element. After it has run, all events that were already fired before the >> script are fired again. However, this seems awfully messy if the script >> also observes readyState or networkState. It might also interfere with >> browsers that use scripts behind the scenes to implement the native >> controls. >> >> Although a kludge, another solution might be to block events from being fired >> until x more bytes of the document have been parsed or it has finished >> loading. > > On Wed, 18 May 2011, Robert O'Callahan wrote: >> >> For certain kinds of events ("load", the video events, maybe more), >> delay the firing of such events until, say, after DOMContentLoaded has >> fired. If you're careful you might be able to make this a strict subset >> of the behaviors currently allowed by the spec ... i.e. you're >> pretending that your frame, image and video loads simply didn't complete >> until after DOMContentLoaded fired in the outer page. That would mean >> it's compatible with properly-written legacy content ... if there is >> any. >> >> Of course I have no idea whether that approach is actually feasible :-). >> It obviously isn't compatible with what browsers currently do, so >> authors wouldn't want to rely on it for a long time if ever. > > These don't seem like workable solutions. We can't delay load events for > every image on the Web, surely. Remembering every event that's ever fired > for any <img> or <video> just in case a handler is later attached seems a > bit intractable, too. > > This has been a problem since JavaScript was added in the 90s. I find it > hard to believe that we have to suddenly fix it now. > > > On Tue, 24 May 2011, Silvia Pfeiffer wrote: >> >> Ian and I had a brief conversation recently where I mentioned a problem >> with extended text descriptions with screen readers (and worse still >> with braille devices) and the suggestion was that the "paused for user >> interaction" state of a media element may be the solution. I would like >> to pick this up and discuss in detail how that would work to confirm my >> sketchy understanding. >> >> *The use case:* >> >> In the specification for media elements we have a <track> kind of >> "descriptions", which are: >> "Textual descriptions of the video component of the media resource, >> intended for audio synthesis when the visual component is unavailable >> (e.g. because the user is interacting with the application without a >> screen while driving, or because the user is blind). Synthesized as a >> separate audio track." >> >> I'm for now assuming that the synthesis will be done through a screen >> reader and not through the browser itself, thus making the >> descriptions available to users as synthesized audio or as braille if >> the screen reader is set up for a braille device. >> >> The textual descriptions are provided as chunks of text with a start >> and a end time (so-called "cues"). The cues are processed during video >> playback as the video's playback time starts to fall within the time >> frame of the cue. Thus, it is expected the that cues are consumed >> during the cue's time frame and are not present any more when the end >> time of the cue is reached, so they don't conflict with the video's >> normal audio. >> >> However, on many occasions, it is not possible to consume the cue text >> in the given time frame. In particular not in the following >> situations: >> >> 1. The screen reader takes longer to read out the cue text than the >> cue's time frame provides for. This is particularly the case with long >> cue text, but also when the screen reader's reading rate is slower >> than what the author of the cue text expected. >> >> 2. The braille device is used for reading. Since reading braille is >> much slower than listening to read-out text, the cue time frame will >> invariably be too short. >> >> 3. The user seeked right into the middle of a cue and thus the time >> frame that is available for reading out the cue text is shorter than >> the cue author calculated with. >> >> Correct me if I'm wrong, but it seems that what we need is a way for >> the screen reader to pause the video element from continuing to play >> while the screen reader is still busy delivering the cue text. (In >> a11y talk: what is required is a means to deal with "extended >> descriptions", which extend the timeline of the video.) Once it's >> finished presenting, it can resume the video element's playback. > > Is it a requirement that the user be able to use the regular video pause, > play, rewind, etc, controls to seek inside the extended descriptions, or > should they literally pause the video while playing, with the audio > descriptions being controlled by the same UI as the screen reader? > > >> IIUC, a video is "paused for user interaction" basically when the UA has >> decided to pause the video without the user asking to pause it (i.e. the >> paused attribute is false) and the pausing happened not for network >> buffering reasons, but for other reasons. IIUC one concrete situation >> where this state is used is when the UA has reached the end of the >> resource and is waiting for more data to come (e.g. on a live stream). > > That latter state is not "paused for user interaction", it's just stalled > due to lack of data. The rest is accurate though. > > >> To use "paused for user interaction" for extending descriptions, we need >> to introduce a means for the screen reader to tell the UA to pause the >> video when it reaches the end of the cue and it's still busy delivering >> a cue's text. Then, as it finishes, it will un-pause the video to let it >> continue playing. >> >> To me it sounds like a feasible solution. >> >> The screen reader could even provide a user setting and a short-cut so a >> user can decide that they don't want this pausing to happen or that they >> want to move on from the current cue. >> >> Another advantage of this approach is that e.g. a deaf-blind user could >> hook up their braille device such that it will deliver the extended >> descriptions and also deliver captions through braille with such >> extension pausing happening. (Not sure that such a user would even want >> to play the video, but it would be possible.) >> >> Now, I think there is one problem though (at least as far as I can >> tell). Right now, IIUC, screen readers are only passive listeners on the >> UA. They don't influence the behaviour of the UA. The accessibility API >> is basically only a one-way street from the UA to the AT. I wonder if >> that is a major inhibitor of using this approach or whether it's easy >> for UAs to overcome this limitation? (Or if such a limitation even >> exists - I don't know enough about how AT work...). >> >> Is that an issue? Are there other issues that I have overlooked? > > That seems to be entirely an implementation issue. > > -- > Ian Hickson U+1047E )\._.,--....,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 3 June 2011 08:22:06 UTC