Re: Introducing CBOR-LD... from Orie Steele on 2020-07-24 (public-credentials@w3.org from July 2020)

From: Orie Steele <orie@transmute.industries>
Date: Fri, 24 Jul 2020 11:55:49 -0500
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, "public-credentials@w3.org" <public-credentials@w3.org>
Message-ID: <CAN8C-_J3JEJ4GzS_UivoDQ-trcrkUP6akM-f_fb+u2JS7HXrWQ@mail.gmail.com>
Sorry I am late to the CBOR-LD Party!

Very excited to have a semantic linked data format that is also usable in a
compact binary representation, and to have bi-directional
transformation out of the box... I have been playing with CBOR on the
weekends, and I have a repo here:
https://github.com/transmute-industries/decentralized-cbor/blob/master/src/__fixtures__/outputs/table.csv

The repo compares, JSON, JSON-LD, CBOR, DAG_CBOR and ZLIB_URDNA2015_CBOR (
another approach at compressed linked data format in CBOR)... I am eager to
add tests for CBOR-LD.

both DAG_CBOR and CBOR-LD have some benefits over CBOR
and ZLIB_URDNA2015_CBOR and JSON....

Both are linked data formats where the linked data aspect is preserved at
the binary level. ZLIB_URDNA2015_CBOR is just a compressed JSON-LD object
encoded as CBOR, you cannot leverage internal semantics... in much the same
way you cannot leverage internal semantics of "Pure JSON" and "Pure
CBOR".... However, ZLIB_URDNA2015_CBOR is MUCH smaller than DAG_CBOR /
"Pure CBOR" that was built from "Pure JSON", and CBOR-LD is MUCH smaller
than ZLIB_URDNA2015_CBOR...

Backing up for a second, one way to think about why CBOR-LD is awesome is
to consider how all software that processes data, has some opinion about
that data... sometimes these opinions are encoded in schema validation of
incoming data (using tools like JSON Schema or ProtoBuff)... If you
consider that changes to data on the wire would cause the software to
explode... you can see why agreeing to a common context, is similar to
agreeing to a data schema....

And by relying on an existing context to build a compressed binary
representation of a semantic object, we can leverage these "common
dictionaries / vocabularies" not just for semantic disambiguation, but also
for compression....

Obviously the IoT space has been waiting for something like this for a long
time...

- https://www.w3.org/WoT/
-
https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/dtdlv2.md

We are now able to convert all these ontologies and semantic vocabularies,
into compact, interoperable, binary representations for industries that
have already committed to the semantic web:
https://github.com/semantalytics/awesome-semantic-web#ontologies

I'm not sure of the potential internal representation benefits for services
like https://developers.google.com/knowledge-graph but obviously, a small
IOT device that only speaks CBOR-LD would not need to crack out a JSON
parser and all the attack surface associated with it, just to talk to the
knowledge graph service.

OS



On Fri, Jul 24, 2020 at 10:54 AM Leonard Rosenthol <lrosenth@adobe.com>
wrote:

> It's not just specific schemas but also the order of the schemas, any
> other keys you add, plus additional "techniques" you add.
>
> Using your presentation as a guide:
> Slide 11:
>
> In that case you have picked a single schema, found all the items, and
> given the unique value (let's say 1-10.).  Now (not shown on the slide,
> but...), I assume that you then pick another schema and start allocating
> values for it in the dictionary (eg. 11-20), and so on.   At some point the
> credentials schema is updated (1.1->1.2) - but you can't update the
> existing entries in the dictionary and just add the new ones to the end
> (eg. 100-105).  And then you encode something using that dictionary - how
> does something downstream know that you are using the 1.2 version of the
> context?  It would simply have a 100 in there - but w/o that in the
> dictionary, it's not decodable.
>
>
> Slide 14:
>
> This is a good example of how to reduce size by switching from a string
> representation to binary.  I assume we will find more of those cases over
> time.   *BUT* a decoder needs to understand this encoding approach - but
> again, how would they recognize something new?
>
>
> At a minimum, we need a way to encode the version of the CBOR-SC algorithm
> that is used to encode a given data set.   That would go a *long way* to
> resolving my concerns.
>
> Leonard
>
> On 7/24/20, 11:19 AM, "Manu Sporny" <msporny@digitalbazaar.com> wrote:
>
>     On 7/24/20 11:00 AM, Leonard Rosenthol wrote:
>     > However, the main use case that you present in the presentation is
>     > QRCodes - which exist as a mechanism to move from digital to analog
>     > (and back).   The analog world is long lived - even if not
>     > necessarily archival - and the data needs to be retrievable.  And
>     > that can't happen w/o knowing the right (version of the) dictionary
>     > to use.
>
>     ... which is why we strongly suggest that all production contexts
> should
>     be versioned, frozen, and cryptographically hashed. There is a general
>     mitigation for your concern. :)
>
>     To be clear, this issue is well known in the JSON-LD ecosystem and that
>     ecosystem has thrived (deployed on tens of millions of domains) in
> spite
>     of the danger. That community has learned how to manage constantly
>     evolving vocabularies (schema.org), and how to lock vocabularies down
> (VCs).
>
>     There are solutions to the problem you outline, cryptographically
>     hashing URLs is one thing we explored, but that bloats the size of the
>     CBOR-LD bytes. Like any technology, CBOR-LD is a series of difficult
>     design trade-offs.
>
>     Just like we made the conscious decision in JSON-LD to be able to
>     reference external JSON-LD Context files (which people insisted was
>     madness and unworkable when we did it... and still do), we make the
> same
>     conscious decision now (because it worked out pretty well for JSON-LD,
>     and it's not clear how doing the same thing in CBOR-LD would be any
>     different).
>
>     If we wanted to eliminate the risk you highlighted, we wouldn't be able
>     to solve the most pressing use cases.
>
>     -- manu
>
>     --
>     Manu Sporny -
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmanusporny%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C068dbd2266774d9df7c108d82fe4ec40%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637312007547071439&amp;sdata=9FPko04mJd9Ti%2FqTUGWCAA9L8v6V4N1TfQTeC%2BSwyr0%3D&amp;reserved=0
>     Founder/CEO - Digital Bazaar, Inc.
>     blog: Veres One Decentralized Identifier Blockchain Launches
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftinyurl.com%2Fveres-one-launches&amp;data=02%7C01%7Clrosenth%40adobe.com%7C068dbd2266774d9df7c108d82fe4ec40%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637312007547071439&amp;sdata=VRjEMw2dMaAme%2F5ZYMLf7EhcLxxHcyu%2B5rCEOx4N2dU%3D&amp;reserved=0
>
>

-- 
*ORIE STEELE*
Chief Technical Officer
www.transmute.industries

<https://www.transmute.industries>
Received on Friday, 24 July 2020 16:56:18 UTC