[whatwg] media resources: addressing media fragments through URIs spec

On Mon, Jul 5, 2010 at 2:46 AM, Aryeh Gregor <Simetrical+w3c at gmail.com> wrote:
> On Sun, Jul 4, 2010 at 9:19 AM, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>> All of the image formats that you are pointing out have an image mime
>> type. I am merely pointing out that to support ogg theora browsers
>> would need to support a video mime type in an <img> element. I don't
>> see that as the intention of the <img> element, in particular since
>> <img> elements do not have transport controls and the like. Otherwise,
>> why did we create a <video> element in the first place.
>
> I'd expect that a video in <img> would behave like an animated GIF --
> no sound, no APIs to control playback, no browser-provided controls.
> You might want this sometimes, especially if you're only selecting one
> frame. ?Animated images are conceptually different from videos, and
> there's no reason you couldn't support the same format for both <img>
> and <video>, with those different semantics. ?It would be particularly
> useful to support video frames as images in places where <video> can't
> be used, like for the <video poster> attribute, CSS backgrounds, and
> so on. ?The video MIME type does not conflict at all with allowing
> this kind of usage.
>
> So to cover this use-case, it would be good if there were a way of
> explicitly selecting one frame, which could be treated as a video that
> contains only one frame. ?This might, in turn, be accepted by some
> browsers in places where they accept images. ?You could do this by
> explicitly allowing syntax like #t=10,10, where the start point equals
> the end point, as selecting only one frame. ?(But I guess this could
> be emulated by #t=10,10.001 or something, assuming the frame starts at
> exactly t=10.)

The issue with #t=10,10 is that the semantics of the interval are that
of a semi-open interval: the start point is in and the end point is
out. This has been the traditional understanding of an interval
related to video (e.g. RTSP defines it that way too). Thus,
"video.ogv#t=10,10" is like asking from byte range 50 to 50 - it's
simply empty.

Further, there are complications with extracting a single frame from a
video since not every point in time will map onto a keyframe, but most
will map onto intra frames, i.e. non-complete frames. Thus, if you ask
for #t=10,10.001, you will most likely receive a region around that
time segment that is a decodable video byte range - maybe a region
that maps to #t=9.02, 12.4 - the UA will know what it asked for and be
able to display only the actually requested part in the middle after
decoding all the bits.

To repeat: I am not convinced it is a good idea to support the video
mime type in an <img> element, even if we change the semantics and
ignore the audio etc. I am not saying it is not possible, I am just
saying that I would not recommend it and would suggest to rather do it
on the server with some transcoding action - it is really not
difficult to install ffmpeg or mplayer on the server, develop a query
format that delivers keyframes from a particular time offset, and do a
keyframe dump on the server upon request. You might want to cache the
result, too.

Cheers,
Silvia.

Received on Sunday, 4 July 2010 16:29:35 UTC