Exploring how to encode code point sets

Recently I've been thinking about the specific design of the protocol for
the subset and patch method since we'll need that for the analysis. One of
the most important pieces is how to efficiently encode the code point sets
that are transferred from the client to server on each request. If an
inefficient encoding is used it could add a material amount of overhead to
the requests.

So I came up with a list of potential methods for encoding the sets and
tested them out on simulated code point sets. An overview of the analysis
and the results can be found here
<https://docs.google.com/document/d/19K5MCElyjdUZknoxHepcC3s7tc-i4I8yK2M1Eo2IXFw/edit?usp=sharing>
.

Does anyone have other ideas on techniques/thoughts for efficiently
encoding sets of codepoints?

Received on Wednesday, 24 July 2019 21:14:52 UTC