SUBJECT: New IETF Internet-Draft draft-mcnally-deterministic-cbor-00

TL;DR --- Blockchain Commons has posted an IETF Internet Draft (I-D) for
discussion about deterministic CBOR
*https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/
<https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/>*

Blockchain Commons has recently expanded its effort on dCBOR, or
deterministic CBOR, which is a deterministically encoded version of the IETF
RFC 8949: CBOR - Concise Binary Object Representation data format
<http://cbor.io/>. As part of our new work, we’ve released dCBOR codecs in
Rust and Swift for community
<https://github.com/BlockchainCommons/Community/blob/master/release-path-standards.md#community-review>review
as well as a command-line app to verify encodings:

   - Rust dCBOR Codec: https://github.com/BlockchainCommons/bc-dcbor-rust
   - Swift dCBOR Codec: https://github.com/BlockchainCommons/BCSwiftDCBOR
   - dCBOR-CLI: https://github.com/BlockchainCommons/dcbor-cli

We’ve been leveraging the CBOR standard as a data format because it’s a
binary format that’s concise, compact, self-describing, good in constrained
environments, platform/language agnostic, and standardized as an IETF RFC.
(see our “Why CBOR?” article & video for more details:
https://www.blockchaincommons.com/introduction/Why-CBOR/)

But, we also have a need for our data format to be deterministic
<https://en.wikipedia.org/wiki/Deterministic_algorithm>. That’s
because our Gordian
Envelope <https://www.blockchaincommons.com/introduction/Envelope-Intro/> Smart
Document format uses Merkle-tree hashes to assure the consistency of an
Envelope that’s been elided or encrypted, or produced from identical data
by disparate agents. For the hashes to remain consistent, the same data
must always be encoded in the same way, which is to say, deterministically!

We are sure that we’re not the only ones who can benefit from deterministic
encoding! There are many international and other standards groups that are
moving toward using CBOR for emerging security standards, including W3C,
ISO, and more.

The IETF CBOR RFC includes a section (§4.2) on deterministically encoding
CBOR:

https://datatracker.ietf.org/doc/html/rfc8949#name-deterministically-encoded-c

That section of the RFC includes several requirements for deterministic
CBOR, but it lists other things as “considerations” which might be
implemented differently by different encoders. This includes how to encode
numbers like -0 and NaN. There are also some ambiguities in the
“shortest-form” rules which state that floating point numbers and large
integers (“BigNums”) should be encoded in their shortest form. According to
the RFC, all of these considerations are “opt-in” for codec implementers,
and have not been prioritized in most existing implementations, putting the
cognitive burden for correct deterministic serialization and validation of
deterministic compliance when deserializing directly onto the application
engineer.

To support a fully codified version of dCBOR, we have proposed a new IETF
internet-draft (I-D) that focuses on a more strict definition of
deterministically encoded CBOR:

https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/

Here is the editor’s copy, including all the latest corrections and
clarifications. Issues and PRs should be posted there.

https://blockchaincommons.github.io/WIPs-IETF-draft-deterministic-cbor/draft-mcnally-deterministic-cbor.html

This I-D defines two major domains of responsibility for successfully
implementing dCBOR: those which are properly the responsibility of the
codec because they deal with the implementation details of serialization,
and those which application developers depending on dCBOR must undertake.

This I-D is being discussed in the IETF CBOR discussion lists at
https://mailarchive.ietf.org/arch/browse/cbor/

We feel that fully defining dCBOR as an international standard is crucial
to its advancement so that data is deterministically encoded in the same
way by different encoders, creating a truly interoperable standard.

As we say in our dCBOR internet-draft:

It is important to stress that dCBOR is *not* a new dialect of CBOR, and
that all dCBOR is well-formed CBOR that can be read by existing CBOR codecs.

The goal of our I-D is to provide norms and practices that standardize
binary serialization so that different implementers can produce formatted
data that is identical to the point that it will always hash the same when
created from the same originating data. We also wish to remove as much
cognitive load as possible from adopters of dCBOR, putting responsibility
for implementation details on codec implementers; for example, the form in
which particular numeric values must be serialized.

So far, one of the largest points of discussion in our current I-D has to
do with the serialization of floating point numbers with no fractional
component, e.g. “10.0”. We believe that the “shortest-form” requirement of
RFC 8949 requires us, in principle, to convert floating point numbers to
integers when doing so results in no loss of accuracy. This, for example,
reduces the three (up to seven!) RFC-accepted ways of encoding the concept
of zero down to one: the byte 0x00. The same consideration applies to
codecs that support BigNums
<https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic>, or that
might support future numeric standards such as quadruple precision
<https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format>.
This also has benefits in conciseness, which will help in deployment to
constrained environments, such as JavaCard devices, TPMs, SEs, or even
implemented in silicon logic.

However, we are, of course, open to rough consensus input on this topic, as
the ultimate goal is to create standards for everyone who interacts with
numeric and other types of CBOR values.

Thanks for your interest! We hope to be able to quickly develop these
guidelines for using dCBOR so that we can all be sure that we are
approaching (and encoding!) dCBOR in the same way.

-- Christopher Allen - Blockchain Commons

Received on Friday, 10 March 2023 03:30:11 UTC