Re: [whatwg/encoding] TextEncoder#encode - write to existing Uint8Array (#69)

Adding this feature would be problematic, because it would expose to the Web the encoder's behavior when approaching the end of the buffer.

Considering the experience with encoding_rs, it very useful not to try to fill the output buffer exactly but to
1. Never let a byte sequence representing a single code point be split across output buffer boundaries
2. Be allowed to signal that the output buffer is full if the worst-case output length for a code point (4 bytes in the case of UTF-8) doesn't fit regardless of what the next input code point is.

Specifying one particular behavior is likely to lead to problems with encoding back ends that don't exhibit the specified behavior already. encoding_rs doesn't even have a single behavior: the SIMD optimizations for the ASCII range show different buffer end behavior compared to the non-SIMD code, so if the input ends with Basic Latin code points, how close to the end of the buffer the write proceeds depends on whether the number of final Basic Latin code points is a multiple of 16.

Allowing implementation-defined behavior, on the other hand, could easily be an interop problem if Web devs test with an implementation that e.g. is willing to convert a final BMP code point when there are 3 bytes of output space remaining but another implementation wants to see 4 bytes of space even if the remaining input would be a BMP code point.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/69#issuecomment-349946317

Received on Thursday, 7 December 2017 11:55:53 UTC