Re: [whatwg] VIDEO and pitchAdjustment from Kevin Marks on 2015-09-01 (public-whatwg-archive@w3.org from September 2015)

From: Kevin Marks <kevinmarks@gmail.com>
Date: Tue, 1 Sep 2015 11:36:36 -0700
To: David Singer <singer@apple.com>
Cc: WHAT Working Group <whatwg@lists.whatwg.org>, Yay295 <yay295@gmail.com>, Robert O'Callahan <robert@ocallahan.org>
Message-ID: <CAD6ztsr=W73p=mKgZE1U24ziWWVV4zLigFF5a7C2qqdCKBrEFg@mail.gmail.com>

On Tue, Sep 1, 2015 at 10:55 AM, David Singer <singer@apple.com> wrote:

>
> > On Sep 1, 2015, at 10:47 , Yay295 <yay295@gmail.com> wrote:
> >
> > On Tue, Sep 1, 2015 at 11:30 AM, David Singer <singer@apple.com> wrote:
> > > On Sep 1, 2015, at 4:03 , Robert O'Callahan <robert@ocallahan.org>
> wrote:
> > >> On Tue, Sep 1, 2015 at 8:02 PM, Kevin Marks <kevinmarks@gmail.com>
> wrote:
> > >> QuickTime supports full variable speed playback and has done for well
> over
> > >> a decade. With bidirectionally predicted frames you need a fair few
> buffers
> > >> anyway, so generalising to full variable wait is easier than posters
> above
> > >> claim - you need to work a GOP at a time, but memory buffering isn't
> the
> > >> big issue these days.
> > >
> > > "GOP”?
> >
> > Group of Pictures.  Video-speak for the run between random access points.
> >
> > > How about a hard but realistic (IMHO) case: 4K video (4096 x 2160), 25
> fps,
> > > keyframe every 10s. Storing all those frames takes 250 x 4096 x 2160 x
> 2
> > > bytes = 4.32 GiB. Reading back those frames would kill performance so
> that
> > > all has to stay in VRAM. I respectfully deny that in such a case,
> memory
> > > buffering "isn't a big issue”.
> >
> > well, 10s is a pretty long random access interval.
> >
> > There's no way to know the distance between keyframes though. The video
> could technically have only one keyframe and still work as a video.
>
> yes, but that is rare. There are indeed videos that don’t play well
> backward, or consume lots of memory and/or CPU, but most are fine.
>
> >
> > >> What QuickTime got right was having a ToC approach to video so being
> able
> > >> to seek rapidly was possible without thrashing , whereas the stream
> > >> oriented approaches we are stuck with no wean knowing which bit of
> the file
> > >> to read to get the previous GOP is the hard part.
> > >
> > > I don't understand. Can you explain this in more detail?
>

I explained the essential difference a while ago here:
http://lists.xiph.org/pipermail/vorbis-dev/2001-October/004846.html

The QuickTime file format defines movies that have tracks made of media;
the tracks are en edit list on the media; the media have the frame layout
information encoded.


> >
> > The movie file structure (and hence MP4) has a table-of-contents
> approach to file structure; each frame has its timestamps, file location,
> size, and keyframe-nature stored in compact tables in the head of the file.
> This makes trick modes and so on easier; you’re not reading the actual
> video to seek for a keyframe, and so on.
> >
> > I suppose the browser could generate this data the first time it reads
> through the video. It would use a lot less memory. Though that sounds like
> a problem for the browsers to solve, not the standard.
>
> There is no *generation* on the browser side; these tables are part of the
> file format.


Well, when it imports stream-oriented media it has to construct these in
memory, but they can be saved out again. I know that in theory this made
its way into the mp4 format, but I'm not sure how much of it is real.

Received on Tuesday, 1 September 2015 18:37:07 UTC