Re: Pipeline objects open questions from Filip Pizlo on 2017-09-12 (public-gpu@w3.org from September 2017)

From: Filip Pizlo <fpizlo@apple.com>
Date: Tue, 12 Sep 2017 13:24:50 -0700
To: Corentin Wallez <cwallez@google.com>
Cc: "Myles C. Maxfield" <mmaxfield@apple.com>, public-gpu <public-gpu@w3.org>
Message-id: <5B10D78A-9715-4BC7-AA2A-C66FC922670C@apple.com>
> On Sep 12, 2017, at 1:09 PM, Corentin Wallez <cwallez@google.com> wrote:
> 
> On Mon, Sep 11, 2017 at 11:44 PM, Myles C. Maxfield <mmaxfield@apple.com <mailto:mmaxfield@apple.com>> wrote:
> 
> 
> On Sep 8, 2017, at 12:24 PM, Corentin Wallez <cwallez@google.com <mailto:cwallez@google.com>> wrote:
> 
>> Hey all,
>> 
>> While what goes into pipeline objects is mostly clear (see this doc <https://github.com/gpuweb/gpuweb/blob/master/design/Pipelines.md>), there is still a bunch of open questions:
>> How do we take advantage of the pipeline caching present in D3D12 and Vulkan? Do we expose it to the application or is it done magically in the WebGPU implementation?
> 
> The desire here is valid. Metal doesn’t have any concept of a pipeline cache, but it does have an intermediate form that shaders can be in (i.e. if you create an Xcode project and add a .metal file to it, Xcode will compile to this at build-time). Caching these would speed up the initialization of a WebGPU webapp.
> 
> Would there be a way for browsers to use these tools at runtime? The Metal API doesn't have a way to get a binary representation of the MTLLibraries.
>  
> However, both D3D12 and Vulkan expose this by letting the application access the raw bytes of the cache (presumably so the application can serialize them to disk). If these objects are going to hold machine code for compiled shaders, letting them round-trip through arbitrary JavaScript would be unacceptable from a security perspective. Instead, the only way this could work is if the webapp was delivered an opaque handle (or cookie) which had no intrinsic meaning, but was used as a key in an internal map the browser maintains (and the browser prunes at implementation-defined times with implementation-dependent pruning criteria).
> 
> I can't find a proper reference to this, but my understanding it that IndexedDB allows storing opaque objects and retrieving them. Vulkan and D3D12's data (and Metal's if possible) could be stored in it securely. The opaque object could also contain data necessary to invalidate the cache, like a driver version, WebGPU library version etc.

Yeah.  An object reference in JS qualifies as an “opaque handle” so long as we don’t give you any kind of edit privileges to the innards.  That’s not hard to accomplish.  We already do that with WebAssembly Module and Instance objects, which internally refer to code.

It’s OK for the API to have an object that internally refers to machine code on the GPU, and it’s OK for IndexedDB to provide some mechanism for storing that object.  There are security implications here.  Any browser that does this would have to be careful about the integrity of that cache.  But we can say that this is an implementation problem, outside of the scope of the spec.

Also, if the API is designed right, the thing being stored into IndexedDB is not semantically required to be compiled code.  An implementation could choose to only store source, and then recompile when the user asks to load it from the DB.  Stuff like this could be left open for implementers to decide.  The browser could even prune the cache of compiled code but still keep enough to reconstruct it later.

Separately, the browser itself may choose to cache compiled shaders like it currently caches many other things.

Overall, the problems of pipeline caching seems a lot like the problems of WebAssembly module caching.  What I’m proposing isn’t far off from where WebAssembly landed, but there is more to the story.  JF might be the best person to elaborate on that.

Aside from details, how does this best-of-both-worlds strategy sound?

-Filip


>  
> But if the browser is going to hold on to these objects internally anyway, there isn’t really any value in going through the webapp at all. Instead of making the webapp author write new code to get performance, we should just make fast performance the default. These pipeline caches would work best if the browser internally always used them for all WebGPU apps.
> 
> I agree with this: D3D12, Metal and Vulkan have different caching mechanism and the WebGPU implementation would be able to make the best use of it, compared to web apps. I'm thinking for example of a Vulkan backend that could have a pipeline derivative per (vs, fs) combination, while in D3D12 it would re-use D3D12_SHADER_BYTECODE and only use the cache if the pipelines match exactly.
> 
> With regard to your test with strip cut index, I think it is dangerous to rely on a non-specced behavior like that.
> 
> The problem is the Metal always enables primitive restart, so to have consistent behavior we need it enabled all the time on D3D12 and Vulkan too. This way we won't have application using the 0xFFFF index in 16bit and have it produce different results on different platforms. If there was a masking mechanism like Dzmitry mentioned, it would be best as we wouldn't have to encode the index type in the pipeline. Not addressing primitive restart for the MVP sounds fine, but it will be important for a 1.0.
> 
> > Should the vertex attributes somehow be included in the PipelineLayout so vertex buffers are treated as other resources and changed in bulk with them?
>  
> I don't think we should try to innovate here as opposed to just providing what D3D12/Vulkan/Metal have (and they don't have vertex attributes in the layout/signature).
> 
> Understood. I found this while developing NXT I was thinking it would help reduce the number of code paths and slightly reduce the CPU overhead of WebGPU.
> 
> There are multiple kinds of sample counts. There is a sample count of an image, defining the actual storage properties. The framebuffer (containing images) then implicitly carries that property (sample count of the storage).
> There is a sample count of the rasterizer, defining the fixed function state, and like all the other fixed function state it should be in the pipeline descriptor/state. So my answer would be "no".
> 
> What happens when the sample count of the rasterizer is different from the sample count of the attachments? If some cases are valid I agree we should have it specified separately in the pipeline state.
> 
> > It sounds like you’re asking for us to choose between two cases:
> 
> I see the question caused some confusion. Depth bounds test != depth test.
> I voted for both to be explicit, potentially by using WebIDL optional dictionary entries/default values semantics.
> 
> Yeah some bullet contained two unrelated questions. I think the first one was about Metal not having "depth bounds" test as far as I understand.
> 
> The second part is about how disabling depth test being the same as using the "always" test function. Do we want to have redundant information that would produce the same thing with (depthTestEnable = true, depthCompare = always) and (depthTestEnable = false, depthCompare = whatever). Or do we just want to have depthCompare?
> 
> If I understand correctly everyone agrees we should have an explicit "independentBlend". If it is set to false, then blend[0] should be used for all attachments?
Received on Tuesday, 12 September 2017 20:25:15 UTC