Re: Introducing CBOR-LD...

Stepping back to ask a simple question:

What is the relationship or the difference between this new DB spec:

And this existing (and sparsely populated) draft spec at the W3C:

Is the former spec simply an evolution of the latter? What’s the delta
between these approaches?

On Fri, Jul 24, 2020 at 11:57 AM Orie Steele <>

> Sorry I am late to the CBOR-LD Party!
> Very excited to have a semantic linked data format that is also usable in
> a compact binary representation, and to have bi-directional
> transformation out of the box... I have been playing with CBOR on the
> weekends, and I have a repo here:
> The repo compares, JSON, JSON-LD, CBOR, DAG_CBOR and ZLIB_URDNA2015_CBOR (
> another approach at compressed linked data format in CBOR)... I am eager to
> add tests for CBOR-LD.
> both DAG_CBOR and CBOR-LD have some benefits over CBOR
> and ZLIB_URDNA2015_CBOR and JSON....
> Both are linked data formats where the linked data aspect is preserved at
> the binary level. ZLIB_URDNA2015_CBOR is just a compressed JSON-LD object
> encoded as CBOR, you cannot leverage internal semantics... in much the same
> way you cannot leverage internal semantics of "Pure JSON" and "Pure
> CBOR".... However, ZLIB_URDNA2015_CBOR is MUCH smaller than DAG_CBOR /
> "Pure CBOR" that was built from "Pure JSON", and CBOR-LD is MUCH smaller
> than ZLIB_URDNA2015_CBOR...
> Backing up for a second, one way to think about why CBOR-LD is awesome is
> to consider how all software that processes data, has some opinion about
> that data... sometimes these opinions are encoded in schema validation of
> incoming data (using tools like JSON Schema or ProtoBuff)... If you
> consider that changes to data on the wire would cause the software to
> explode... you can see why agreeing to a common context, is similar to
> agreeing to a data schema....
> And by relying on an existing context to build a compressed binary
> representation of a semantic object, we can leverage these "common
> dictionaries / vocabularies" not just for semantic disambiguation, but also
> for compression....
> Obviously the IoT space has been waiting for something like this for a
> long time...
> -
> -
> We are now able to convert all these ontologies and semantic vocabularies,
> into compact, interoperable, binary representations for industries that
> have already committed to the semantic web:
> I'm not sure of the potential internal representation benefits for
> services like but
> obviously, a small IOT device that only speaks CBOR-LD would not need to
> crack out a JSON parser and all the attack surface associated with it, just
> to talk to the knowledge graph service.
> OS
> On Fri, Jul 24, 2020 at 10:54 AM Leonard Rosenthol <>
> wrote:
>> It's not just specific schemas but also the order of the schemas, any
>> other keys you add, plus additional "techniques" you add.
>> Using your presentation as a guide:
>> Slide 11:
>> In that case you have picked a single schema, found all the items, and
>> given the unique value (let's say 1-10.).  Now (not shown on the slide,
>> but...), I assume that you then pick another schema and start allocating
>> values for it in the dictionary (eg. 11-20), and so on.   At some point the
>> credentials schema is updated (1.1->1.2) - but you can't update the
>> existing entries in the dictionary and just add the new ones to the end
>> (eg. 100-105).  And then you encode something using that dictionary - how
>> does something downstream know that you are using the 1.2 version of the
>> context?  It would simply have a 100 in there - but w/o that in the
>> dictionary, it's not decodable.
>> Slide 14:
>> This is a good example of how to reduce size by switching from a string
>> representation to binary.  I assume we will find more of those cases over
>> time.   *BUT* a decoder needs to understand this encoding approach - but
>> again, how would they recognize something new?
>> At a minimum, we need a way to encode the version of the CBOR-SC
>> algorithm that is used to encode a given data set.   That would go a *long
>> way* to resolving my concerns.
>> Leonard
>> On 7/24/20, 11:19 AM, "Manu Sporny" <> wrote:
>>     On 7/24/20 11:00 AM, Leonard Rosenthol wrote:
>>     > However, the main use case that you present in the presentation is
>>     > QRCodes - which exist as a mechanism to move from digital to analog
>>     > (and back).   The analog world is long lived - even if not
>>     > necessarily archival - and the data needs to be retrievable.  And
>>     > that can't happen w/o knowing the right (version of the) dictionary
>>     > to use.
>>     ... which is why we strongly suggest that all production contexts
>> should
>>     be versioned, frozen, and cryptographically hashed. There is a general
>>     mitigation for your concern. :)
>>     To be clear, this issue is well known in the JSON-LD ecosystem and
>> that
>>     ecosystem has thrived (deployed on tens of millions of domains) in
>> spite
>>     of the danger. That community has learned how to manage constantly
>>     evolving vocabularies (, and how to lock vocabularies
>> down (VCs).
>>     There are solutions to the problem you outline, cryptographically
>>     hashing URLs is one thing we explored, but that bloats the size of the
>>     CBOR-LD bytes. Like any technology, CBOR-LD is a series of difficult
>>     design trade-offs.
>>     Just like we made the conscious decision in JSON-LD to be able to
>>     reference external JSON-LD Context files (which people insisted was
>>     madness and unworkable when we did it... and still do), we make the
>> same
>>     conscious decision now (because it worked out pretty well for JSON-LD,
>>     and it's not clear how doing the same thing in CBOR-LD would be any
>>     different).
>>     If we wanted to eliminate the risk you highlighted, we wouldn't be
>> able
>>     to solve the most pressing use cases.
>>     -- manu
>>     --
>>     Manu Sporny -
>>     Founder/CEO - Digital Bazaar, Inc.
>>     blog: Veres One Decentralized Identifier Blockchain Launches
> --
> Chief Technical Officer
> <>

Received on Tuesday, 28 July 2020 10:08:02 UTC