[whatwg/encoding] TextEncoder and TextDecoder memory issue due resizable/shared buffers (Issue #344)

WebReflection created an issue (whatwg/encoding#344)

### What is the issue with the Encoding Standard?

I did reply with a (apologies bad) link here https://github.com/whatwg/encoding/issues/172 but I also documented issues related to SharedArrayBuffer or resizable ArrayBuffer (not even shared) here: https://gist.github.com/WebReflection/3324b5ac79768c85efbf3b725d6a9d73

## Background

We are using Dynamic Workers and Atomics to simulate a *blocking* operation from interpreted PLs (or even JavaScript itself) that do the following:

  * create a SAB of length 8 (2 * Int32 bytes length) and post it with proxied details to the main thread
  * wait sync to be sure *notify* happened (an old Firefox issue that won't be resolved) at index 0 and the length (max int32 positive boundary) of the resulting binary-serialized data is known
  * we postMessage the new SAB with such length + 4 bytes (due notify issue in Firefox when index 0 is assigned to 0 and then notified)
  * the previously binary-serialized outcome is stored in the SAB via a view and index 0 is set to 1 to notify it's ready
  * the worker grab via "same" view binary content, deserializes it and it moves forward

### Issues with this approach

Mostly *performance* but also *memory*: a tab with a worker that uses this strategy uses a lot of RAM (at least twice the ram) for every single operation until that's completed and this is bad for mobile phones or less powerful devices, or people with just dozens opened tabs that use similar strategy.

## Ideally

I am working to refactor that dance to work in this ideal way:

  * we create a resizable SharedArrayBuffer (max Int32 upper size or half of it as growability) on the worker that can also be reused as there's no concurrency while it's synchronously waiting via Atomics
  * we handle proxy details and send such SAB right away
  * the main binary-serialize results directly in such SAB and notify it's ready
  * the worker binary-deserialize the reused SAB and keep going

This refactoring has the following obvious advantages:

  * there is only one SAB per worker (and usually one worker per main thread), thanks to the fact growable SAB is widely usable these days
  * there is a single `postMessage` dance
  * the binary-serializer never creates unnecessary intermediate representation of whatever value that needs to be stored into a buffer
  * the binary-deserializer never creates unnecessary intermediate buffer slices or whatsoever to retrieve the result of this roundtrip
  * the memory consumption is kept minimal, the growing is predictable, everything is faster

# The current issue

I've spent way more time than I should've to find a performant way to avoid one-off creation of typed array views that bloat in RAM and bother GC **because** `new TextEncoder().encodeInto(str, view)` does not work with *SharedArrayBuffer* and nether does `new TextDecoder().decode(view)` but, most importantly, even if I wanted to use at least a resizable *ArrayBuffer* as fallback the *The provided Uint8Array value must not be resizable* error comes up.

To solve these issues I ended up ignoring entirely these APIs because these are not suitable for more complex scenarios and I start wondering what was the whole purpose of `encodeInto` and `decode` when where it's needed, binary data tha travels across realms or WASM exchanges, cannot use memory that can grow and shrink on demand.

## Issue summary

  * these APIs have an extremely narrowed use case which easily results int bloated RAM to create new buffers all over the place by design
  * these APIs can be easily bypassed in purpose by `DataView` or direct *view* manipulation via JS code, defeating entirely the original *guards* meant to help developers, but the reality is that these limitations are just on the way when any developer caring about RAM and performance would like to use the platform
  * it's not clear why even resizable, single owned, *ArrayBuffer* cannot be used for *synchronous* blocking operations such as `encodeInto`
  * it's not clear how developers sure that a SAB cannot have concurrent access and is safe to use, can use these APIs
  * it's clear (to me) and sad I should avoid these APIs as opposite of trusting the platform does the right thing

I hope at least *some* of these concerns can be either tackled or answered and, regardless, I feel like none of these issues is well documented out there so I'll write a post with demoes and benchmarks about *how to ignore TextEncoder & TextDecoder* but I am afraid that won't make developers happy, rather slightly confused about the fact anyone can workaround these limitations by ignoring native APIs and keep doing what they need to do.

Thanks for your patience in reading this and thanks in advance for any possible action around these issues that could make usage of these native APIs more appealing in the (hopefully) near future.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/344
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/344@github.com>

Received on Sunday, 16 March 2025 16:36:06 UTC