- From: Garret Rieger <grieger@google.com>
- Date: Wed, 27 Jan 2021 14:09:22 -0700
- To: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
- Message-ID: <CAM=OCWZxs+yyHEG9sVEwx98e9_SoPe0Zsx_7Tw0tGs+jvBaPUQ@mail.gmail.com>
The patch subset progressive font enrichment method operates by sending encoded messages between the server and client. These messages can be represented as key value pairs. For transfer over the network these messages need to be encoded into bytes. Prior to standardization we need to pick a specific data serialization format to use. There are many existing serialization formats, so this document describes the specific requirements for serialization in patch subset and then evaluates a number of serialization formats against those requirements. Finally a recommendation for which format to use for standardization is made. Requirements - Can encode objects built from key value pairs. Ideally keys should be allowed to be integers. This allows a more compact representation than string keys. - Compact encoding: one of the primary goals of PFE is to save bytes transferred over the network. So it follows that the serialization format we use should be as compact as possible. - Supports byte arrays: response messages will need to contain a large byte array with the font patch. So the serialization format should be able to encode byte arrays directly. - Standardized: since the serialization format will be referenced from a standard, it will itself need to be standardized through a standards body. - Messages encoded by a newer version of the message schema should still be decodable by a decoder with an older version of the message schema. - Performant: font loading is render blocking so message encoding and decoding should be fast. Investigated Encoding FormatsProtocol Buffers Protocol Buffers <https://developers.google.com/protocol-buffers> A compact serialization protocol that uses schema’s for the data types. This is what we used in the prototype version of patch subset. It meets all of the requirements, except that it is not standardized. Decision: can’t use, not standardized. JSON JSON <https://tools.ietf.org/html/rfc7159> JSON is commonly used on the web to serialize messages. It’s a text based encoding of key value objects. Because it is text based it has a few drawbacks: - Keys are strings - The encoding is not compact. Even with compression applied it’s still larger than binary encodings. - Byte arrays can’t be encoded. - Slower performance compared to other binary encodings. For example numeric values need to be parsed as text and then converted to binary. Decision: not compact enough and no binary support. Don’t use. BSON BSON <http://bsonspec.org/> A binary version of JSON which eliminates some of the drawbacks of the text based encoding. However, it still fails a few of the requirements: - Key’s must be strings - While it is more compact than JSON, it’s not as compact as some of the other binary encodings. In particular it does not have support for variable length integers. - It’s not standardized. Decision: can’t use, not standardized. UBJSON UBJSON <https://ubjson.org/> Another variant of binary JSON. It fails to meet the requirements just like BSON: - Key’s must be strings - While it is more compact than JSON, it’s not as compact as some of the other binary encodings. In particular it does not have support for variable length integers. - It’s not standardized. Decision: can’t use, not standardized. CBOR (Concise Binary Object Representation) CBOR - Wikipedia <https://en.wikipedia.org/wiki/CBOR>, rfc8949 <https://tools.ietf.org/html/rfc8949> Uses a single control byte per value which encodes both the type and length of the value. A compact encoding on par with protobuf. - Supports key value maps. Keys can be any type. - Has variable length integers (length prefix via the control byte) - Messages are fully decodable without a schema. - Standardized via IETF: https://tools.ietf.org/html/rfc8949 Decision: we can use, meets all requirements. Message Pack Message Pack Specification <https://github.com/msgpack/msgpack/blob/master/spec.md> Similar to CBOR (CBOR was inspired by Message Pack). Should have similar compactness as CBOR. However, Message Pack is not standardized. Decision: can’t use, not standardized. Custom Encoding One last option is to develop our own serialization format specifically for use in patch subset. The developed encoding would be designed to meet the above requirements and would be standardized as part of PFE standard. This should only be used as a fallback option if no other existing encoding can be found which meets our requirements. Developing and standardizing a new encoding format will require extra specification work. Decision: don’t use, existing format CBOR meets our requirements. Final Recommendation CBOR looks to be a very good fit. It’s a straightforward encoding and meets all of our requirements. It’s unlikely we’d be able to significantly reduce encoding size with a custom encoding. So I recommend that we use CBOR encoding. I'm going to draft a third version of the protocol design document based on COBR and send that out soon.
Received on Wednesday, 27 January 2021 21:09:55 UTC