Re: [encoding] Serializing internal TextDecoder state? (#7) from Benjamin C. Wiley Sittler on 2015-11-04 (public-webapps-github@w3.org from November 2015)

From: Benjamin C. Wiley Sittler <notifications@github.com>
Date: Wed, 04 Nov 2015 13:47:25 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Message-ID: <whatwg/encoding/issues/7/153876551@github.com>

I'm not sure this is "super important" but it could avoid some duplicated
effort when dealing with large inputs where the overall text size does not
fit in a JS string or when processing of chunks of bytes is offloaded to a
ServiceWorker which might not live as long as the caller which items the
stream. Text file readers could also benefit from this to avoid having to
reparse the whole preceding input prior to the previously saved "current
reading position". I think representing state as shifts+trailing
"incomplete" bytes repayable at startup to reach equivalent state is
attractive, but I wonder whether opaque serialization mightn't be easier to
support and potentially more compact. Of course repayable bytes have the
advantage of even being potentially portable to a different implementation.
On Nov 4, 2015 09:32, "Joshua Bell" <notifications@github.com> wrote:

> I still don't know how we'd pull this off without a rewrite (i.e. move off
> of or upstream changes to ICU) but regarding my comments about priming
> encoder state being a security issue above: one approach would just be to
> be able to ask the decoder to output a byte sequence that would correctly
> initialize a new instance to the current state. (i.e. any mode switch bytes
> + buffered lead bytes) when passed in as the start of a stream, rather than
> via some special initialization API.
>
> (That's probably obvious but documenting it for posterity since I didn't
> consider it initially.)
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/whatwg/encoding/issues/7#issuecomment-153801650>.
>

---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/7#issuecomment-153876551

Received on Wednesday, 4 November 2015 21:48:00 UTC