- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Sun, 23 May 2010 21:03:53 +1000
Hi Carlos, 2010/5/23 Carlos Andr?s Sol?s <csolisr at gmail.com>: > Hello, I've been writing lately in the WHATWG and WebM mail-lists and would > like to hear your opinion on the following idea. > > Imagine a hypothetical website that delivers videos in multiple languages. > Like on a DVD, where you can choose your audio and subtitles language. And > also imagine there is the possibility of downloading a file with the video, > along with either the chosen audio/sub tracks, or all of them at once. Right > now, though, there's no way to deliver multiple audio and subtitle streams > on HTML5 and WebM. Since the latter supports only one audio and one video > track, with no embedded subtitles, creating a file with multiple tracks is > impossible, unless using full Matroska instead of WebM - save for the fact > that the standard proposed is WebM and not Matroska. > A solution could be to stream the full Matroska with all tracks embedded. > This, though, would be inefficient, since the user often will select only > one language to view the video, and there's no way yet to stream only the > selected tracks to the user. I have thought of two solutions for this: > > * Solution 1: Server-side demuxing. The video with all tracks is stored as a > Matroska file. The server demuxes the file, generates a new one with the > chosen tracks, and streams only the tracks chosen by the user. When the user > chooses to download the full video, the full Matroska file is downloaded > with no overhead. The downside is the server-side demuxing and remuxing; > fortunately most users only need to choose once. Also, there's the problem > of having to download the full file instead of a file with only the tracks > wanted; this could be solved by even more muxing. For the last 10 years, we have tried to solve many of the media challenges on servers, making servers increasingly intelligent, and by that slow, and not real HTTP servers any more. Much of that happened in proprietary software, but others tried it with open software, too. For example I worked on a project called Annodex which was trying to make open media resources available on normal HTTP servers with only a cgi script installed that would allow remuxing files for serving time segments of the media resources. Or look at any of the open source RTSP streaming servers that were created. We have learnt in the last 10 years that the Web is better served with a plain HTTP server than with custom media servers and we have started putting the intelligence into user agents instead. User agents now know how to do byte range requests to retrieve temporal segments of a media resource. I believe for certain formats it's even possible to retrieve tracks through byte range requests only. In short, the biggest problem with your idea of dynamic muxing on a server is that it's very CPU intensive and doesn't lead easily to a scalable server. Also, it leads to specialised media servers in contrast to just using a simple HTTP server. It's possible, of course, but it's complex and not general-purpose. > * Solution 2: User-side muxing. Each track (video, audio, subtitles) is > stored in standalone files. The server streams the tracks chosen by the > user, and the web browser muxes them back. When the user chooses to download > the video, the generation of the file can be done either server-side or > client-side. This can be very dynamic but will force content providers to > use extra coding inside of the pages. Again, we've actually tried this over the last 10 years with SMIL. However, synchronising audio and video that comes from multiple servers and therefore has different network delays, different buffering rates, different congestion times, etc. makes it really difficult to keep multiple media resources in sync. You don't actually have to rip audio and video apart to achieve what you're trying to do. Different Websites are created for different languages, too. So, I would expect that if your Website is in Spanish, you will get your video with a Spanish audio track, or when it's in German, your audio will be German. Each one of these is a media resource with a single audio and a single video track. Yes, your video track is replicated on the server between these different resources. But that's probably easier to handle from a production point of view anyway. The matter with subtitle / caption tracks is then a separate one. You could embed all of the subtitle tracks in all the media resources to make sure that when a file is downloaded, it comes with its alternative subtitle tracks. That's not actually that huge an overhead, seeing as text tracks make up the least space compared to the audio and video data. Or alternatively you could have the subtitle tracks as extra files. This is probably the preferred mode of operation and most conformant with traditional Web principles, seeing as they are text resources and the best source of information for indexing the content of a media resource in, e.g. a search engine. Also, such files are much easier to administrate than if they are inside a media resource - easier to produce separately from the media resource and add later - easier to edit post-publishing - and easier to provide from e.g. a database rather than as an actual file. It is this latter approach that the new HTML5 <track> element is pursuing. In this scenario, the Web browser will indeed synchronise the text with the media resource for playback. It doesn't need to do muxing for this, since it only needs to display the media resource and the text in syc, not actually create a new resource. Whether we want to take the next step and do an actual muxing on the client for a downloaded media resource with multiple <track> elements is a question that needs to be discussed. It is indeed a possibility. But it's not something I'm worried about, since there are tools available for muxing that I can use if I really wanted to create such a file after downloading the individual text tracks. Yet, I have to say that for the situation where we are actually dealing with multitrack media resources in HTML5, we still haven't got an interface available. There is a proposal at http://www.w3.org/WAI/PF/HTML/wiki/Media_MultitrackAPI, which is still in the W3C bug tracker and will need to get adapted to work with the new <track> element before being introduced. Cheers, Silvia.
Received on Sunday, 23 May 2010 04:03:53 UTC