- From: Corentin Wallez <cwallez@google.com>
- Date: Thu, 27 Feb 2020 14:41:12 +0100
- To: internal-gpu <internal-gpu@w3.org>, public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNM8_VkxedPFoFAmgSX03_CpvRp1LqXBr+OZ8rGF53Zo5A@mail.gmail.com>
Whoops, meant to send this to public-gpu. On Thu, Feb 27, 2020 at 2:40 PM Corentin Wallez <cwallez@google.com> wrote: > Hey all, > > I spent quite some time yesterday trying to understand how alternatives > like Synchronous mapping #506 <https://github.com/gpuweb/gpuweb/pull/506> > or Failable mapping #511 <https://github.com/gpuweb/gpuweb/pull/511> could > be implemented and which modifications they would need to be implementable. > At the end of the day an important property became clear for > implementations looking to minimize copies using shared memory between the > content process and the GPU process. Below I'm using terminology from > buffer mapping proposals but the same is true for proposals that would > introduce new staging buffer objects. > > *Accurate tracking on the content process of which mappable buffers are in > use is not tractable.* > > We want to avoid races so that the data written by JS at the time it says > call unmap is what gets seen by the GPU until the next mapping operation > (either mapSync, or mapAsync promise resolution). Without some asynchronous > ownership transfer (mapAsync like) I claim it is not tractable to know > accurately on the content-process side whether it is safe to write to the > shared memory region. > > The first reason why it is not tractable, is that we don't know accurately > if queue submits that use the mappable resource are finished. While it is > "easy" to follow progress of queues themselves, knowing which submits use > the resource would require adding duplicated content-side tracking in > bindgroups, all encoders, command buffers, bundles to know what mappable > resources they hold. This would be a many-fold increase in the amount of > tracking that WebGPU requires on the content side. > > Second, even doing the tracking above is not enough for accurate tracking > because it doesn't take into account error propagation. What if one of the > commands in the submit that uses the mappable resource is an error? Is the > resource still considered in use by the content side when the GPU process > side does nothing? One solution would be to duplicate all the validation on > the content side, but that's way too expensive, and can't help with error > objects due to OOM. > > *Consequences for the existing proposals.* > > Synchronous mapping #506 <https://github.com/gpuweb/gpuweb/pull/506> has > two alternatives, either the mapping operation blocks when the buffer is in > use, or a new staging area is returned that will be copied in the buffer at > a later time. Since accurate tracking isn't possible, either WebGPU will > most often do a roundtrip to the GPU process (aka block) or it will most > often introduce an extra copy. > > Failable mapping #511 <https://github.com/gpuweb/gpuweb/pull/511> has the > mapping operation return null when the buffer is in use. This requires > knowing accurately when it is in use and is not tractable. > > <https://github.com/gpuweb/gpuweb/pull/506#issuecomment-590199697> > Myles' comment about mapAsync > <https://github.com/gpuweb/gpuweb/pull/506#issuecomment-590199697> where > a desirable property would be that the mapAsync promise resolve immediately > if the buffer is currently not in use. Guaranteeing this at the spec level > requires accurate tracking of which buffers are in use so it isn't > possible. There needs to be at least one roundtrip to the GPU process to > "lock" the resource. > > *Another direction using APIs that allow wrapping CPU memory into a buffer* > > The following APIs allow taking an OS memory object, or just a pointer, > and turn it into a GPU resource: > > - ID3D12Device3::OpenExistingHeapFromAddress > <https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device3-openexistingheapfromaddress> > - MTLDevice newBufferWithBytesNoCopy:length:options:deallocator: > <https://developer.apple.com/documentation/metal/mtldevice/1433382-newbufferwithbytesnocopy?language=objc> > - VK_EXT_external_memory_host > <https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VK_EXT_external_memory_host> > > These APIs would allow true zero-copy between JS and the GPU and > allocating staging memory in the content process synchronously via the > following mechanisms: > > - In the content process, allocate shared-memory between the content > process and GPU process (at least in Chromium that's possible). > - Send the shared memory to the GPU process. > - In the GPU process create a new resource by wrapping that shared > memory region (or allocating a new resource if it's not possible). > - In the content process, write to the memory then send an unmap > signal to the GPU process. > - On the GPU process, if wrapping was not possible, copy from the > shmem to the GPU resource. > - Profit! > > An idea I was exploring is having something like mapSync that can replaces > the allocation of a GPUBuffer to a new native buffer via the mechanism > described above. However a design constraint we have been operating with is > that a WebGPU resource is exactly a native API resource so that doesn't > work either. (imagine we baked bindgroups with the address of the buffer, > we want to avoid needing dirtying mechanisms). > > *Conclusion* > > Like the other times I tried, I wasn't able to come up with a better > solution than mapAsync. It's the only one that works so far but the > asynchrony makes it a bit difficult for people to use so it'd be nice to > have an alternative. > > At least I learnt an important design constraint, and discovered that it > is possible to wrap CPU memory in a GPU resource to optimize things. Also I > started a HackMD to discuss tradeoffs again > <https://hackmd.io/qWmMfnFVRtyR0Q2HVSagOw?both>. It doesn't have content > but at least it has links to all the proposals if you want to keep it as a > set of bookmarks. > > Cheers, > > Corentin >
Received on Thursday, 27 February 2020 13:41:37 UTC