Minutes for the 2017-06-28 meeting

GPU Web 2017-06-28

Chair: Corentin and Dean

Scribe: Dean (with some help)

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1_vFF4VaLphkb9Zs5qCoYNvfgwnj8aq_l7E5U1LaXnXY/edit#>Tentative
agenda

   -

   Administrative stuff (if any)


   -

   Individual design and prototype status


   -

   Ongoing investigations
   -

      Encoders / command buffer constraints
      -

      Queues
      -

   Things we didn’t get to yet
   -

      Pipeline state
      -

      Render passes
      -

   Agenda for next meeting

Attendance

   -

   Chris Marrin (Apple)
   -

   Dean Jackson (Apple)
   -

   Myles C. Maxfield (Apple)
   -

   Theresa O'Connor (Apple)
   -

   Warren Moore (Apple)
   -

   Gareth Morgan (Axum Graphics)
   -

   Austin Eng (Google)
   -

   Corentin Wallez (Google)
   -

   Kai Ninomiya (Google)
   -

   Ken Russell (Google)
   -

   Aleksandar Stojilijkovic (Intel)
   -

   Ben Constable (Microsoft)
   -

   Chas Boyd (Microsoft)
   -

   Rafael Cintron (Microsoft)
   -

   Dzmitry Malyshau (Mozilla)
   -

   Jeff Gilbert (Mozilla)
   -

   Alex Kluge (Vizit Solutions)
   -

   Kirill Dmitrenko (Yandex)
   -

   Doug Twilleager (ZSpace)
   -

   Elviss Strazdiņš
   -

   Joshua Groves

Administrative items

Face to Face Meeting

   -

   CW: It seems that the best option is to host the F2F in Chicago. Would
   people prefer it to be before or after the Khronos F2F?
   -

   CW: Khronos is 17-21 Sept.
   -

   JG: I would prefer the same week, either day before or day after WebGL.
   -

   CW: Ken, can you choose the day for WebGL?
   -

   KR: I will ask.
   -

   CW: OK, before or after?
   -

   DM: I’m not sure I can come to Chicago.
   -

   CW: Let’s pencil in Chicago, Friday 22nd September.
   -

   RC: I’d prefer it to be exactly before or after WebGL.
   -

   CW: We’ll ask the WebGL meeting to be on Thursday 21st.
   -

   KR: Yes. Hold that as a preliminary date.


TPAC

   -

   DJ: contacted TPAC, but CGs are supposed to only get two hours slots.
   However we could have a joint meeting. Will update when we have an answer
   from TPAC on whether we can have a full day meeting.


Legal discussion on contribution agreement

   -

   DJ: Apple lawyers talking to Google lawyers
   -

   RC: Microsoft lawyers involved
   -

   CW: others let us know if you want to add your lawyers to the mix

Individual design and prototype status

   -

   AE: (Google) D3D12 backend can look at texture models and stuff.
   -

   CW: end2end test working and you can test pixel values
   -

   CW: Just finished the investigation on RenderPasses. Up on Github.
   -

   DM: (Mozilla) No new milestones on our end.
   -

   Elviss: I’ve implemented a Metal backend for my game engine. This means
   I’ll be able to give more input when discussing design issues. I’m working
   on D3D12 next.
   -

   DJ: (Apple) no update for now, but got meetings with the Metal team to
   understand its design better as well as try to get them to participate.

Encoders

   -

   CW: Last time we checked, we were waiting on feedback from Apple about
   the cost of swapping between encoder types.
   -

   MM: Cost of ending a compute encoder is fairly high, unless it is
   followed by a compute encoder. It’s fairly expensive to swap types, or to
   end a RenderEncoder and start another RenderEncoder.
   -

   DJ: Basically Metal can coalesce compute and compute but that’s an
   implementation detail. Otherwise starting/ending encoders is expensive.
   -

   MM: About queues in Metal vs. Vulkan / D3D12. The API level above the
   hardware in Metal does not expose whether or not the queues are going to
   the same hardware unit or different hardware units. Metal actually adds
   synchronisation primitives if they are going to separate units.
   -

   CW: Let’s go back to queues later.
   -

   DM: Does Apple confirm that multiple queues may be sent in parallel to
   different hardware units?
   -

   MM: Drivers are free to do that. But it is not explicit.
   -

   CW: So how do people feel about “now I’m going to do compute” signals?
   -

   MM: We have an idea about this, involving RenderPasses (from Vulkan). It
   seems that they are a good match to a RenderEncoder in Metal.
   -

   CW: So you’d be able to interleave Compute between two RenderPasses?
   This doesn’t work in Vulkan. Because …..
   -

   KN: Are you talking about compute between two renderpasses?
   -

   JG: The idea hinges on the fact that you have to put graphics stuff
   inside renderpasses, but you don’t have to do that for compute. Is this
   true?
   -

   CW: Vulkan can *only* do graphics inside RenderPasses. D3D can do
   everything whenever it wants.
   -

   CW: Were you suggesting putting compute inside RenderPass?
   -

   MM: No, just that there is another type of Pass, or that the start of
   the next RenderPass explicitly closes the virtual compute pass.
   -

   KN: Outside of Vulkan style renderpasses you can switch freely between
   copy and compute. We could expose “begin end compute” outside of
   renderpasses so the
   -

   CW: So effectively there would be begin/end for compute/graphics/copy,
   and you’re required to be explicit?
   -

   DJ/MM: Yes.
   -

   CW: Anything we would need to add to beginCompute or beginBlit?
   -

   MM: Not in Metal.
   -

   CM: Sounds good. Does this sound ok for D3D12?
   -

   BC: From my perspective with D3D12, the synch is explicit everywhere. As
   long as the queue types match, you use fences to synch, and it just goes to
   the driver. So it sounds like these proposals would be easy to emulate, and
   might just be no-ops in D3D12.
   -

   BC: Vk / MTL have a lot of stuff that looks for mobile so that the devs
   know when tiles are flushed etc. Think that it makes sense to have them
   explicit in WebGPU.
   -

   CW: NVIDIA and AMD now have some tiled GPUs. I expect this means D3D12
   will eventually add support for this. Please make sure that our design maps
   well to whatever Microsoft comes up with.
   -

   BC: Sure
   -

   CW: So it sounds like we have consensus on Encoders/Passes.
   -

   DJ: How do we document we have consensus.
   -

   CW: Definitely put it in the Github.
   -

   DJ: No need for a document yet.
   -

   DM: So the conclusion is a Metal-style encoders or something like it?
   Can you do bindings outside the begin/end and have them persist over the
   passes/blocks?
   -

   CW: I’m not sure what the reason for this is.
   -

   DM: ….
   -

   CW: For D3D12 you have to set the modes for compute and graphics. ???
   -

   CW: Not having inheritance seems like an ok thing, since both Metal and
   D3D12 behave this way.
   -

   DJ: What is the benefit of this in Vulkan content today?
   -

   CW: RenderPasses are expensive enough that I don’t think it matters.
   -

   DM: We shouldn’t make it more expensive though. We can prototype it and
   report back.
   -

   CW: We should ask about inheritance usage on the Khronos Gitlab.

Queues

   -

   CW: Do we expose one universal Q or one of each type, or a family of Qs?
   -

   DM: For Vk, I put data on gitlab.
   -

   DM: One of the major manufacturers has a graphics only queue, but it
   isn’t exposed by their Vulkan driver yet.
   -

   CW: My understanding from the Vk discussion is that it is unlikely we
   will want to expose a graphics-only queue to the Web.
   -

   CB: I think that the future will bring a reduction in the differences
   between compute and graphics. So I’m not sure what people will be able to
   do with the knowledge that they have a Q optimised just for graphics. I
   think we could add it later if necessary, as some kind of flag.
   -

   CW: What about Qs that people use for async compute?
   -

   CB: Your average web developer only needs universal queues, can later
   add a flag where we say “this queue will only do compute”. Transfer is
   available everywhere.
   -

   MM: It is common to have compute only queues?
   -

   CW: Yes. Look at the Vulkan hardware DB, a few vendors have compute only
   queues.
   -

   DM: You’re arguing for Qs that can do everything?
   -

   CB: Yes. I think it is a good place to start. And I think the future
   will move in the direction of universal queues.
   -

   CW: Vulkan always promises at least one Q that can do everything.
   -

   CB: So it is a flag that helps the driver optimise?
   -

   CW: No. In Vulkan you ask the device for what Q types it supports, and
   how many you have. When you create the device you specify how many you
   want.
   -

   JG: Queues are allocated at device creation time in Vulkan whereas they
   are a la carte on other APIs.
   -

   CB: I would push for the high-level abstraction.
   -

   JG: We’re trying to have a low-level capable platform and not an easy
   thing
   -

   DM: how to map the higher level to Vulkan
   -

   JG: requires you allocate queues at device creation time, or virtualize
   queues.
   -

   CW: Metal has to virtualize anyway.
   -

   DJ: What if we have only one queue? What would someone lose?
   -

   CW: Compute queue have been used to gain 10% to 15% by optimizing all
   bottlenecks on the GPU at the same time.
   -

   MM: One way we could get to this is have the Q creation take a
   dictionary, currently empty, but we could add the ability to define new Q
   types later.
   -

   CB: Most HW doesn’t support async graphics unless you have multiple
   GPUs. But most HW supports compute queues.
   -

   CW: Exposing hardware limitations is ok, but I think it is essential to
   expose a similar level of functionality to all Web users.
   -

   CB: Is that that if the app has to check all platforms on all GPUs, then
   it causes problems (CW: sorry this is badly transcribed)
   -

   DJ: Metal team mentioned that having something that works consistently
   across all HW is a design goal.
   -

   DM:
   -

   CB: We can have application declare how many queues they will use and
   emulate things on API / HW that doesn’t have the queues present.
   -

   CW: I agree. We can say that we always expose one generic Q and up to N
   compute Qs.
   -

   DM: I don’t think it is possible to require even a single async compute
   Q.
   -

   CB: THe user can’t really tell async vs. sync.
   -

   CW: If the application has the right semaphores betweens Qs, then GPU
   sync devolves into serialization anyway. I don’t think there is too much
   work to virtualise async compute into the generic Q type.
   -

   JG: Worry that we are papering a lot over the complexity and polyfilling
   over stuff. A lot of the reason for the new APIs is that you know the
   low-level stuff and tune to that.
   -

   DJ: True, but that’s what Metal does: go lower-level than OpenGL but
   still expose enough low-level functionality to be useful. We now have 3+
   years of experience suggesting that we made the right decision.
   -

   CB: What we’re losing by having queues declared is SLI support, but few
   developers are using SLI.
   -

   AK: (missed)
   -

   CB: Don’t know of HW that has multiple HW graphics queues. Graphics is a
   superset of compute which is a superset of transfer (in D3D, not Vulkan)
   -

   JG: I feel like we either need to rule out the existence of compute-only
   Qs, or support them.
   -

   CB: I’m fine with ruling them out.
   -

   CW: We could allow some flags when you create a Q in WebGPU. The
   implementation works out which Q to put it on, which could be the single Q
   for devices that only have a universal Q. So rather than query, we give
   hints up front.
   -

   JG: I’m fine with that. It’s not functionally different from the Vulkan
   API.
   -

   JG: We haven’t decided where we are creating queues.
   -

   CW: I think the Vulkan restriction means we’ll have to create Qs at
   device creation time.
   -

   CW: Proposal: when you create a device, you ask for Qs with flags
   explaining what types of operations you’ll execute on them.
   -

   CB: Allocating the processing resources you plan to use.
   -

   MM: Returning multiple queues will be problematic because fences don’t
   work across them.
   -

   CW: Can have only one backing universal queue in Metal and virtualize on
   top of it.
   -

   MM: Could cause deadlocks but sounds fine for now.
   -

   CB: Need to be able to guarantee a minimum number of queue of each type
   you can get.
   -

   BC: With textures 2kx2k is fine but 16kx16k depends on the hardware.
   -

   MM: How about 1 universal and 0 of others.
   -

   CW: What about applications that want to take async compute, but not
   rework their application?
   -

   JG: I don’t think we should lie about async compute.
   -

   CB: I am fine with a minimum of 0.
   -

   CW: I am ok with that at this point in time.
   -

   DM: I think virtualization will be tricky if we emulate multiple queue
   types then there could be synchronization problems and deadlocks…. (missed)

Agenda for next meeting

   -

   Queues
   -

   Render targets
   -

   Pipeline state if ready?

Received on Thursday, 29 June 2017 17:35:50 UTC