[whatwg] [canvas] request for {create, get, put}ImageDataHD and ctx.backingStorePixelRatio from Darin Fisher on 2012-04-16 (public-whatwg-archive@w3.org from April 2012)

From: Darin Fisher <darin@chromium.org>
Date: Mon, 16 Apr 2012 14:34:33 -0700
Message-ID: <CAP0-Qpsse0d_FTgrcTpF=De3qewB9fRPKmoZo=-jzNakNKe9fw@mail.gmail.com>
On Mon, Apr 16, 2012 at 1:39 PM, Oliver Hunt <oliver at apple.com> wrote:

>
> On Apr 16, 2012, at 1:12 PM, Darin Fisher <darin at chromium.org> wrote:
>
> Glenn summarizes my concerns exactly.  Deferred rendering is indeed the
> more precise issue.
>
> On Mon, Apr 16, 2012 at 12:18 PM, Oliver Hunt <oliver at apple.com> wrote:
>
>> Could someone construct a demonstration of where the read back of the
>> imagedata takes longer than a runloop cycle?
>>
>
> I bet this would be fairly easy to demonstrate.
>
>
> Then by all means do :D
>


Here's an example.

Take http://ie.microsoft.com/testdrive/Performance/FishIETank/, and apply
the following diff (changing the draw function):

BEGIN DIFF
--- fishie.htm.orig     2012-04-16 14:23:29.224864338 -0700
+++ fishie.htm  2012-04-16 14:21:38.115489276 -0700
@@ -177,10 +177,17 @@
             // Draw each fish
             for (var fishie in fish) {
                 fish[fishie].swim();
             }

+
+            if (window.read_back) {
+                var data = ctx.getImageData(0, 0, WIDTH, HEIGHT).data;
+                var x = data[0];  // force readback
+            }
+
+
                        //draw fpsometer with the current number of fish
             fpsMeter.Draw(fish.length);
         }

         function Fish() {
END DIFF

Running on a Mac Pro, with Chrome 19 (WebKit @r111385), with 1000 fish, I
get 60 FPS.  Setting read_back to true (using dev tools), drops it down to
30 FPS.

Using about:tracing (a tool built into Chrome), I can see that the read
pixels call is taking ~15 milliseconds to complete.  The implied GL flush
takes ~11 milliseconds.

The page was sized to 1400 x 1000 pixels.

-Darin



>
>
>
>> You're asking for significant additional complexity for content authors,
>> with a regression in general case performance, it would be good to see if
>> it's possible to create an example, even if it's not something any sensible
>> author would do, where their is a performance improvement.
>>
>> Remember, the application is only marginally better when it's not
>> painting due to waiting for a runloop cycle than it is when blocked waiting
>> on a graphics flush.
>>
>
> You can do a lot of other things during this time.  For example, you can
> prepare the next animation frame.  You can run JavaScript garbage
> collection.
>
> Also, it is common for a browser thread to handle animations for multiple
> windows.  If you have animations going in both windows, it would be nice
> for those animations to update in parallel instead of being serialized.
>
>
> None of which changes the fact that your actual developer now needs more
> complicated code, and has slower performance.  If I'm doing purely
> imagedata based code then there isn't anything to defer, and so all you're
> doing is adding runloop latency.  The other examples you give don't really
> apply either.
>
> Most imagedata both code i've seen is not GC heavy, if you're performing
> animations using css animations, etc then I believe that the browser is
> already able to hoist them onto another thread.  If you have animations in
> multiple windows then chrome doesn't have a problem because those windows
> are a separate process, and if you're not, then all you're doing is
> allowing one runloop of work (which may or may not be enough to get a paint
> done) before you start processing your ImageData.  I'm really not sure what
> it is that you're doing with your ImageData such that it takes so much less
> time than the canvas work, but it seems remarkable that there's some
> operation you can perform in JS over all the data returned that takes less
> time that the latency introduced by an async API.
>
> --Oliver
>
>
> -Darin
>
>
>
>>
>> Also, if the argument is wrt deferred rendering rather than GPU copyback,
>> can we drop GPU related arguments from this thread?
>>
>> --Oliver
>>
>> On Apr 16, 2012, at 12:10 PM, Glenn Maynard <glenn at zewt.org> wrote:
>>
>> On Mon, Apr 16, 2012 at 1:59 PM, Oliver Hunt <oliver at apple.com> wrote:
>>>
>>> I don't understand why adding a runloop cycle to any read seems like
>>> something that would introduce a much more noticable delay than a memcopy.
>>>
>>
>> The use case is deferred rendering.  Canvas drawing calls don't need to
>> complete synchronously (before the drawing call returns); they can be
>> queued, so API calls return immediately and the actual draws can happen in
>> a thread or on the GPU.  This is exactly like OpenGL's pipelining model
>> (and might well be implemented using it, on some platforms).
>>
>> The problem is that if you have a bunch of that work pipelined, and you
>> perform a synchronous readback, you have to flush the queue.  In OpenGL
>> terms, you have to call glFinish().  That might take long enough to cause a
>> visible UI hitch.  By making the readback asynchronous, you can defer the
>> actual operation until the operations before it have been completed, so you
>> avoid any such blocking in the UI thread.
>>
>>
>>>  I also don't understand what makes reading from the GPU so expensive
>>> that adding a runloop cycle is necessary for good perf, but it's
>>> unnecessary for a write.
>>>
>>
>> It has nothing to do with how expensive the GPU read is, and everything
>> to do with the need to flush the pipeline.  Writes don't need to do this;
>> they simply queue, like any other drawing operation.
>>
>> --
>> Glenn Maynard
>>
>>
>>
>>
>
>
Received on Monday, 16 April 2012 14:34:33 UTC