Re: [whatwg/encoding] Fast byteLength() (Issue #333) from Jamie Kyle on 2024-07-26 (public-webapps-github@w3.org from July 2024)

From: Jamie Kyle <notifications@github.com>
Date: Fri, 26 Jul 2024 11:09:46 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/333/2253243312@github.com>

@jakearchibald I work on an end-to-end encrypted messaging app where we can't inspect the types of payloads being sent between clients on the server, so there are many places where we need to enforce a max byte length on the client to prevent certain types of abuse overloading client apps.

Right now we mostly do encode the data in Node buffers but found that it would be more efficient to catch these things earlier and have the option of dropping payloads that are too large before we start doing anything with that data.

After implementing some of this though, I actually found an even better way of doing this:

```js
function maxLimitCheck(maxByteSize: number) {
 let encoder = new TextEncoder()
  let maxSizeArray = new Uint8Array(maxByteSize + 1)
  return (input: string): boolean => {
    return encoder.encodeInto(input, maxSizeArray).written > maxByteSize
  }
}

let check = maxLimitCheck(5e6) // 5MB

check("a".repeat(5)) // true
check("a".repeat(5e6)) // true
check("a".repeat(5e6 + 1)) // false
check("a".repeat(2 ** 29 - 24)) // false
```

Testing this out in my benchmark repo with the max size array enforcing a couple different limits:

```
./benchmarks/blob.js:                        4.8 ops/sec (±0.1, p=0.001, o=0/10)
./benchmarks/buffer.js:                     54.5 ops/sec (±3.0, p=0.001, o=0/10)
./benchmarks/implementation.js:              0.7 ops/sec (±0.0, p=0.001, o=0/10)
./benchmarks/textencoder.js:                11.9 ops/sec (±1.0, p=0.001, o=0/10)

5MB:
./benchmarks/textencoder-break-early.js: 6’318.7 ops/sec (±743.3, p=0.001, o=8/100) severe outliers=6

50MB:
551.8 ops/sec (±7.6, p=0.001, o=7/100) severe outliers=4

500MB:
51.5 ops/sec (±4.6, p=0.001, o=6/100) severe outliers=4
```

I still believe this is a useful function to have, there are more than [10k results for `Buffer.byteLength(` on GitHub](https://sourcegraph.com/search?q=context:global+Buffer.byteLength%28&patternType=keyword&case=yes&sm=0) (which looking around mostly seem like strings being passed in, although the API accepts Buffers and other typed arrays too).

Seems like a lot of people are using it for `Content-Length` headers too

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/333#issuecomment-2253243312
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/333/2253243312@github.com>

Received on Friday, 26 July 2024 18:09:50 UTC