Re: Introducing CBOR-LD... from Christopher Allen on 2020-07-24 (public-credentials@w3.org from July 2020)

From: Christopher Allen <ChristopherA@lifewithalacrity.com>
Date: Fri, 24 Jul 2020 15:58:03 -0700
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: "public-credentials@w3.org" <public-credentials@w3.org>, Wolf McNally <wolf@wolfmcnally.com>
Message-ID: <CACrqygAZTLUKLpf0kcg4q4scqfkKnLb7znDa7rjG-oCr1aWAPw@mail.gmail.com>

On Fri, Jul 24, 2020 at 2:31 PM Manu Sporny <msporny@digitalbazaar.com>
wrote:

> On 7/24/20 12:55 PM, Orie Steele wrote:
> > The repo compares, JSON, JSON-LD, CBOR, DAG_CBOR and ZLIB_URDNA2015_CBOR
> > ( another approach at compressed linked data format in CBOR)... I am
> > eager to add tests for CBOR-LD.
>
> I'm eager to see the results as well, Orie... I'm wondering if you'd be
> willing to expand your comparison table to the types on slide 6?
>
>
> https://docs.google.com/presentation/d/1ksh-gUdjJJwDpdleasvs9aRXEmeRvqhkVWqeitx5ZAE/edit#slide=id.g866980c4a6_0_14
>
> It might be useful to see how different types of data encodings that are
> commonly used fare. For example, it's useful to understand that because
> base64-encoded JWTs use 6-bit encoding that things that could have
> normally been LZ compressed cannot be compressed because of the
> bit-carrying nature of base64's 6-bit encoding.
>

We did some research on data encoding issues, with some tables at:

https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-003-uri-binary-compatibility.md

It turns out to be a lot more complicated when you add in QR to the
equation. Despite the "compression" of base64url running it through a QR
was actually less efficient than hexadecimal! This is because QR thinks it
is binary, and then expands it once again to the QR encoding format,
introducing a significant increase in size. In addition, it does not try to
internally compress binary. The careful selection of either the BC32
character set, or take advantage of some of the other benefits of ByteWords
encoding character set, allowed us to leverage the QR standard's internal
compression.

By carefully separating the transfer encoding scheme (to optimize for QR)
from the binary encoding scheme (CBOR), we were able to get a significantly
larger amount of data in a single QR. The problem with many other
approaches is that they try to do the transfer encoding scheme, the binary
encoding scheme, the self-describing encoding scheme, and the
error-detection encoding scheme all in the same layer. For instance, we
found Digital Bazaar's fountain encoding scheme for QRs to be at the wrong
layer, so proposed one that we believe provably works better with more
devices and smaller QR code frames, but does not have any cost at the
binary level.

What I'd like to see is how we might be able to combine these efforts, or
at least cooperate.

-- Christopher Allen

Received on Friday, 24 July 2020 22:58:54 UTC