[whatwg/encoding] TextEncoder and TextDecoder performance concern around all libraries / runtimes (Issue #343) from Andrea Giammarchi on 2025-03-14 (public-webapps-github@w3.org from March 2025)

From: Andrea Giammarchi <notifications@github.com>
Date: Fri, 14 Mar 2025 08:09:18 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/343@github.com>

WebReflection created an issue (whatwg/encoding#343)

### What is the issue with the Encoding Standard?

I've been asked to file a new issue after this comment https://github.com/whatwg/encoding/issues/333#issuecomment-2724225478 and here I am doing that:

  * no runtime on the server prefers standards API to convert *JS* *UTF-16* based strings into *UTF-8* compatible strings, yet our Operating Systems, files, the Web itself, runs over *UTF-8* to maximize compatibility with all Programming Languages and avoid misleading, or error prone, conversions all over the place
  * no library which goal is to serialize as binary data JS uses *TextEncoder* or *TextDecoder* if not just after trying to avoid both by all means because these are extremely slow, via `encodeInto` variants or not, compared to more or less accurate *JS* only solutions
  * all *JS* only solutions are fast enough with laptops plugged in, but overall CPUs degrade 5X+ performance once devices are in battery save mode ... or better, all libraries perform extremely bad out of just *JS* code and it's not clear why these APIs are so slow compared to *NodeJS*, *Bun* or other *JS* runtime solutions that are not based on these APIs
  * if all server runtimes need to avoid these APIs and *SharedArrayBuffer* + *Atomics* is used to convert *JS* references otherwise impossible to convert natively as binary, it's clear to me we're lacking a primitive which only purpose would be to return a *buffer* out of `String.prototype.toUTF8Buffer()` or any other named API which goal is to just do that without all the performance caveats behind the scene
  * at the same time, it's not clear why every *JS* solution literally outperform these native APIs so that something might be really off behind these *API* implementations, as most libraries are based on *RFC* standards and has been proven to work for years bypassing *TextEncoder* or *TextDecoder* usage

As summary, it would be great to understand why these APIs are so slow and why there is no interest in having best performance backed in to transform strings as binary data and decode these.

Use cases:

  * **Atomics** via *SharedArrayBuffer* where the only language these APIs speak is *binary* and views of buffers
  * **cross posting + cross programming language** communication
  * **file handling** where all APIs returns `arrayBuffer` but that's both *async* (usually) and yet needs conversion to text too if these are traveling
  * **leader tab patterns** used to enable *OPFS* where data can only travel as binary to be fast enough, avoiding any encoding/decoding all over the place (buffers travel fast)
  * **serialization** where to know the length of a *JS* string we need all sort of invariants to what `byteLength` could provide and yet, that would not solve the whole issue: we also need/want that buffer after

Thanks in advance for considering any alternative API that would boost performance or considering investigating how come we need to loop `charCodeAt` all over the Web to be sure we're either faster than native APIs which goal supposed to be to simplify that encoding/decoding dance 🙏

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/343
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/343@github.com>

Received on Friday, 14 March 2025 15:09:22 UTC