- From: Darin Fisher <darin@chromium.org>
- Date: Mon, 16 Apr 2012 14:34:33 -0700
On Mon, Apr 16, 2012 at 1:39 PM, Oliver Hunt <oliver at apple.com> wrote: > > On Apr 16, 2012, at 1:12 PM, Darin Fisher <darin at chromium.org> wrote: > > Glenn summarizes my concerns exactly. Deferred rendering is indeed the > more precise issue. > > On Mon, Apr 16, 2012 at 12:18 PM, Oliver Hunt <oliver at apple.com> wrote: > >> Could someone construct a demonstration of where the read back of the >> imagedata takes longer than a runloop cycle? >> > > I bet this would be fairly easy to demonstrate. > > > Then by all means do :D > Here's an example. Take http://ie.microsoft.com/testdrive/Performance/FishIETank/, and apply the following diff (changing the draw function): BEGIN DIFF --- fishie.htm.orig 2012-04-16 14:23:29.224864338 -0700 +++ fishie.htm 2012-04-16 14:21:38.115489276 -0700 @@ -177,10 +177,17 @@ // Draw each fish for (var fishie in fish) { fish[fishie].swim(); } + + if (window.read_back) { + var data = ctx.getImageData(0, 0, WIDTH, HEIGHT).data; + var x = data[0]; // force readback + } + + //draw fpsometer with the current number of fish fpsMeter.Draw(fish.length); } function Fish() { END DIFF Running on a Mac Pro, with Chrome 19 (WebKit @r111385), with 1000 fish, I get 60 FPS. Setting read_back to true (using dev tools), drops it down to 30 FPS. Using about:tracing (a tool built into Chrome), I can see that the read pixels call is taking ~15 milliseconds to complete. The implied GL flush takes ~11 milliseconds. The page was sized to 1400 x 1000 pixels. -Darin > > > >> You're asking for significant additional complexity for content authors, >> with a regression in general case performance, it would be good to see if >> it's possible to create an example, even if it's not something any sensible >> author would do, where their is a performance improvement. >> >> Remember, the application is only marginally better when it's not >> painting due to waiting for a runloop cycle than it is when blocked waiting >> on a graphics flush. >> > > You can do a lot of other things during this time. For example, you can > prepare the next animation frame. You can run JavaScript garbage > collection. > > Also, it is common for a browser thread to handle animations for multiple > windows. If you have animations going in both windows, it would be nice > for those animations to update in parallel instead of being serialized. > > > None of which changes the fact that your actual developer now needs more > complicated code, and has slower performance. If I'm doing purely > imagedata based code then there isn't anything to defer, and so all you're > doing is adding runloop latency. The other examples you give don't really > apply either. > > Most imagedata both code i've seen is not GC heavy, if you're performing > animations using css animations, etc then I believe that the browser is > already able to hoist them onto another thread. If you have animations in > multiple windows then chrome doesn't have a problem because those windows > are a separate process, and if you're not, then all you're doing is > allowing one runloop of work (which may or may not be enough to get a paint > done) before you start processing your ImageData. I'm really not sure what > it is that you're doing with your ImageData such that it takes so much less > time than the canvas work, but it seems remarkable that there's some > operation you can perform in JS over all the data returned that takes less > time that the latency introduced by an async API. > > --Oliver > > > -Darin > > > >> >> Also, if the argument is wrt deferred rendering rather than GPU copyback, >> can we drop GPU related arguments from this thread? >> >> --Oliver >> >> On Apr 16, 2012, at 12:10 PM, Glenn Maynard <glenn at zewt.org> wrote: >> >> On Mon, Apr 16, 2012 at 1:59 PM, Oliver Hunt <oliver at apple.com> wrote: >>> >>> I don't understand why adding a runloop cycle to any read seems like >>> something that would introduce a much more noticable delay than a memcopy. >>> >> >> The use case is deferred rendering. Canvas drawing calls don't need to >> complete synchronously (before the drawing call returns); they can be >> queued, so API calls return immediately and the actual draws can happen in >> a thread or on the GPU. This is exactly like OpenGL's pipelining model >> (and might well be implemented using it, on some platforms). >> >> The problem is that if you have a bunch of that work pipelined, and you >> perform a synchronous readback, you have to flush the queue. In OpenGL >> terms, you have to call glFinish(). That might take long enough to cause a >> visible UI hitch. By making the readback asynchronous, you can defer the >> actual operation until the operations before it have been completed, so you >> avoid any such blocking in the UI thread. >> >> >>> I also don't understand what makes reading from the GPU so expensive >>> that adding a runloop cycle is necessary for good perf, but it's >>> unnecessary for a write. >>> >> >> It has nothing to do with how expensive the GPU read is, and everything >> to do with the need to flush the pipeline. Writes don't need to do this; >> they simply queue, like any other drawing operation. >> >> -- >> Glenn Maynard >> >> >> >> > >
Received on Monday, 16 April 2012 14:34:33 UTC