[whatwg/streams] Should I expect this to support "zero-copy" data loading in some way? (#1109)

As someone who wrote a WebAssembly (WASM) module to process data which in practice may wholly be contained in very large files, I stand in front of a problem where the most practical solution could seem to be embracing the Streams API in order to not have cause the user agent have to allocate as much memory as an entire [large] file, but where it can provide the data in successive chunks for WebAssembly to consume. There are problems where it has been mentioned WebAssembly working group should solve, like allowing WASM modules access multiple WASM _memories_ (on the horizon for WASM, evidently) or allow the script host invoking WASM to juggle such memories in and out of reach of WASM. The counterargument is that WASM has requrements with regard to alignment and sizing of its memory objects, which go beyond requirements imposed by the user agent on say, `ArrayBuffer` objects, making the aforementioned feature requests impractical. The thing is I agree with that -- I think WASM memory is an object of a class best suited to be controlled from inside the module; after all relinquishing ownership of its memories brings with it additional complexity for future WASM design.

How do streams come in here?

Well, apart from WASM, it stands to reason that also JavaScript applications would benefit from APIs that use *views*, as opposed to mandating on returning new `ArrayBuffer` object every time a data loading operation is done.

Does the Streams API facilitate this? I am not sure, having learned about "BYOB" readers, it would seem these were the solution here, but why can't I do this then:

```javascript
new Blob([ "foobar" ]).stream().getReader({ mode: "byob" });
```

Maybe I have understood BYOB in context of streams wrong, but what I think would be beneficial is being able to read data from opaque blobs (among other opaque sources) into much more tangible array buffer *that already exists*, to save on a future copy operation in the script -- using the `read(view)` of the obtained reader above would be just the thing, wouldn't it? Except it doesn't work -- apparently streams vended by blobs are not "byte streams". Forgive my ignorance, and the spec may have penetrated too deep into practical application here -- but shouldn't above be a perfect use-case for zero-copy loading of file data into memory available to *both* the script and any WASM module it may run (which could use `Memory.prototype.buffer` to make a view on the memory and hand it to a BYOB reader's `read` call)?

But perhaps Streams API is the wrong API to make changes or additions to, to make scenarios like above, work?

I've read about a dozen issues related to the same "zero copy" umbrella feature request peeking in through the details (zero copy -- an order of magnitude less overhead), but these either focus on WebAssembly -- as if without it there isn't much need to shift to relying on views, where possible -- or appear to chase a rabbit hole of OOP abstractions since around 2014.

What part do you think this specification will play into shifting an entire portfolio of current approaches which create new array buffer for every data loading operation, into something fundamentally relying on views? We don't even have to necessarily consider multi-threading beyond what it already relies upon -- object transfer. If we can transfer the same buffer between threads to make a safe programming model on the Web, I don't see the complications multiple views on the same buffer add to that?

I hope I am making sense with this -- I guess I am frustrated that there are so many APIs that rely on buffers, yet there is next to nothing to avoid excessive, fundamentally unnecessary copying, and neither WebAssembly nor threads appear in my limited understanding to be standing in the way. Yes, we have `TypedArray.prototype.set`, which is a little gem buried deep in the APIs.  The streams API, to my understanding, was motivated by needing better consumption of big data -- to that end, zero copy operations where possible are a continuation of the same direction, so perhaps this is the API to amend?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/streams/issues/1109

Received on Monday, 8 March 2021 19:48:40 UTC